Join a CentOS machine to an AD domain

I ran into enough snags when attempting to join an CentOS 6.6 machine to a Microsoft domain that I thought I would document them here. Hopefully it is of use to someone. The majority of the experience is thanks to this site.

Update 03/16/2015: I came across this site which makes things a little easier when it comes to initial configuration – messing with other config files is no longer necessary. The authconfig command to do this is below:

authconfig --disablecache --enablelocauthorize --enablewinbind --enablewinbindusedefaultdomain --enablewinbindauth        --smbsecurity=ads --enablekrb5 --enablekrb5kdcdns --enablekrb5realmdns --enablemkhomedir --enablepamaccess --updateall        --smbidmapuid=100000-1000000 --smbidmapgid=100000-1000000 --disablewinbindoffline --winbindjoin=Admin_account --winbindtemplateshell=/bin/bash --smbworkgroup=DOMAIN --smbrealm=FQDN --krb5realm=FQDN

Replace DOMAIN with short domain name, FQDN with your fully qualified domain name, and Admin_account with an account with domain admin privileges, then skip to the Reboot section, as it covers everything before that.

Install the necessary packages

yum -y install authconfig krb5-workstation pam_krb5 samba-common oddjob-mkhomedir

Configure kerberos auth with authconfig

There is a curses-based GUI you can use to do this in but I opted for the command line.

authconfig --disablecache --enablewinbind --enablewinbindauth --smbsecurity=ads --smbworkgroup=DOMAIN --smbrealm=DOMAIN.COM.AU --enablewinbindusedefaultdomain --winbindtemplatehomedir=/home/DOMAIN/%U --winbindtemplateshell=/bin/bash --enablekrb5 --krb5realm=DOMAIN.COM.AU --enablekrb5kdcdns --enablekrb5realmdns --enablelocauthorize --enablemkhomedir --enablepamaccess --updateall

Add your domain to kerberos configuration

Kerberos information is stored in /etc/krb5.conf. Append your domain in the realms configuration, like below

vi /etc/krb5.conf
[realms]
 EXAMPLE.COM = {
 kdc = kerberos.example.com
 admin_server = kerberos.example.com
 }
 
DOMAIN.COM.AU = {
admin_server = DOMAIN.COM.AU
kdc = DC1.DOMAIN.COM.AU
kdc = DC2.DOMAIN.COM.AU
}
 
[domain_realm]
 .example.com = EXAMPLE.COM
 example.com = EXAMPLE.COM
 domain.com.au = DOMAIN.COM.AU
 .domain.com.au = DOMAIN.COM.AU

 Test your configuration

Use the kinit command with a valid AD user to ensure a good connection with the domain controllers:

kinit <AD user account>
It should return you to the prompt with no error messages. You can further make sure it worked by issuing the klist command to show open Kerberos tickets
klist

Ticket cache: FILE:/tmp/krb5cc_0
Default principal: someaduser@DOMAIN.COM.AU
Valid starting Expires Service principal
02/27/14 12:23:21 02/27/14 22:23:21 krbtgt/DOMAIN.COM.AU@DOMAIN.COM.AU
renew until 03/06/14 12:23:19
When I tried the kinit command it returned an error:
kinit: KDC reply did not match expectations while getting initial credentials
 After scratching my head for a while I came across this site, which explains that your krb5.conf is case sensitive – it must all be all upper case. Fixing my krb5.conf to be all caps for my domain resolved that issue.

Join the domain

net ads join domain.com.au -U someadadmin
When I tried to join the domain I received this lovely message:
Our netbios name can be at most 15 chars long, "EXAMPLEMACHINE01" is 16 chars long
Invalid configuration. Exiting....
Failed to join domain: The format of the specified computer name is invalid.
Thanks to Ubuntu forms I learned I needed to edit my samba configuration to assign an abbreviated NETBIOS name to my machine.
vi /etc/samba/smb.conf
Uncomment the “netbios name =” line and fill it in with a shorter (max 15 characters) NETBIOS name.
netbios name = EXAMPLE01
You can test to ensure the join was successful with this command
net ads testjoin

Configure home directories

The authconfig command above included a switch for home directories. Make sure you create a matching directory and set appropriate permissions for it.

mkdir /home/DOMAIN
setfacl -m group:"Domain Users":rwx /home/DOMAIN #the article calls to do this, this command doesn't work for me but home directories still appear to be created properly

Reboot

To really test everything the best way is to reboot the machine. When it comes back up, log in with Active Directory credentials. It should work!

Account lockout issues

I ran into a very frustrating problem where everything works dandy if you get the password correct on the first try, but if you mess up even once it results in your Active Directory account being locked. You were locked out after the first try. Each login, even when successful, had this in the logs:

winbind pam_unix(sshd:auth): authentication failure

This problem took a few days to solve. Ultimately it involved modifying two files:

vi /etc/pam.d/system-auth
vi /etc/pam.d/password-auth

As far as I can tell, the problem was a combination of pam_unix being first (which always failed when using AD login), as well as having both winbind and kerberos enabled. The fix was to change the order of each mention of pam_unix to be below any mention of pam_winbind. The other fix I had to do was to comment out mentions of pam_krb5 completely.

#auth        sufficient    pam_krb5.so use_first_pass

Restrict logins

The current configuration allows any domain account to log into the machine. You will probably want to restrict who can log in to the machine to certain security groups. The problem: many Active Directory security groups contain spaces in their name, which Linux doesn’t like.

How do you add a security group that contains a space? Escape characters don’t seem to work in the pam config files.  I found out thanks to this site that it is easier to just not use spaces at all. Get the SID of the group instead.

Use wbcinfo -n to query the group in question, using the backslash to escape the space. It will return the SID we desire.

wbinfo -n Domain\ Users
S-1-5-21-464601995-1902203606-794563710-513 Domain Group (2)

Next, modify /etc/pam.d/password-auth and add the require_membership_of argument to pam_winbind.so:

auth        sufficient    pam_winbind.so require_membership_of=S-1-5-21-464601995-1902203606-794563710-513

That’s it! Logins are now restricted to the security group listed.

Configure sudo access

Sudo uses a different list for authorization, which amusingly, handles escaped spaces just fine.  Simply add the active directory group in sudo as you a local one, eg using a % and then group name, escaping spaces with a backslash:

%Domain\ Users ALL=(ALL) ALL

Rejoice

You’ve just gone through a long and painful battle. Hopefully this article helped you to achieve victory.

Generate SSL certificate for use with Sophos UTM

HTTPS certificate handling in Sophos UTM is a bit different than other systems. I do this often enough but never remember exactly how to do it.

Here are the “cliff notes” of getting an SSL certificate loaded into Sophos UTM. This can be done on any linux / unix system with openssl installed. The full guide was taken from here.

Generate a private key

When creating your key, make sure you use a passphrase.

openssl genrsa -aes256 -out <keyname>.key 2048

Create a certificate signing request (CSR)

openssl req -new -key keyname.key -out csrname.csr

Upload CSR to your certificate company

Sophos UTM uses Openssl so select that option if prompted by your certificate company Specify Apache CSR if asked. Validate your domain ownership, then wait for e-mail with response.

Download output from certificate company

If they give you a zip file, unzip it first

unzip file_from_authority.zip

Combine all files provided into one

You only have to do this if your CA provides more than one CRT file

cat CA1.crt CA2.crt ...   >  combined.crt

Generate p12 file for use with UTM

Generate a pkcs12 file by supplying all files generated above. Be sure to specify an export password (Sophos requires one.)

openssl pkcs12 -export -in combined.crt -inkey <keyname>.key -out desired_p12_file_name.p12

Upload into Sophos UTM

Navigate to certificate management and specify upload key. Upload the file. Be sure to enter the password you used when creating the key earlier.

That’s it!

Configure iSCSI initiator in CentOS

Below are my notes for configuring a CentOS box to connect to an iSCSI target. This assumes you have already configured an iSCSI target on another machine / NAS. Much of this information comes thanks to this very helpful website.

Install the software package

1
yum -y install iscsi-initiator-utils

Configure the iqn name for the initiator

1
vi /etc/iscsi/initiatorname.iscsi
1
2
InitiatorName=iqn.2012-10.net.cpd:san.initiator01
InitiatorAlias=initiator01

Edit the iSCSI initiator configuration

1
vi /etc/iscsi/iscsid.conf
1
node.startup = automatic
node.session.auth.authmethod = CHAP
node.session.auth.username = initiator_user
node.session.auth.password = initiator_pass
#The next two lines are for mutual CHAP authentication
node.session.auth.username_in = target_user
node.session.auth.password_in = target_password

Start iSCSI initiator daemon

1
2
/etc/init.d/iscsid start
chkconfig --levels 235 iscsid on

Discover targets in the iSCSI server:

1
2
iscsiadm --mode discovery -t sendtargets --portal 172.16.201.200 the portal's IP address
172.16.201.200:3260,1 iqn.2012-10.net.cpd:san.target01

Try to log in with the iSCSI LUN:

1
2
3
iscsiadm --mode node --targetname iqn.2012-10.net.cpd:san.target01 --portal 172.16.201.200 --login
Logging in to [iface: default, target: iqn.2012-10.net.cpd:san.target01, portal: 172.16.201.200,3260] (multiple)
Login to [iface: default, target: iqn.2012-10.net.cpd:san.target01, portal: 172.16.201.200,3260] successful.

Verify configuration

This command shows what is put into the  iSCSI targets database  (the files located in /var/lib/iscsi/)

1
cat /var/lib/iscsi/send_targets/172.16.201.200,3260/st_config
1
2
3
4
5
6
7
8
9
10
11
12
discovery.startup = manual
discovery.type = sendtargets
discovery.sendtargets.address = 172.16.201.200
discovery.sendtargets.port = 3260
discovery.sendtargets.auth.authmethod = None
discovery.sendtargets.timeo.login_timeout = 15
discovery.sendtargets.use_discoveryd = No
discovery.sendtargets.discoveryd_poll_inval = 30
discovery.sendtargets.reopen_max = 5
discovery.sendtargets.timeo.auth_timeout = 45
discovery.sendtargets.timeo.active_timeout = 30
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768

Verify session is established

1
2
iscsiadm --mode session --op show
tcp: [2] 172.16.201.200:3260,1 iqn.2012-10.net.cpd:san.target01

Create LVM volume and mount

Add our iSCSI disk to a new LVM physical volume, volume group, and logical volume

1
2
3
4
5
6
7
fdisk -l
Disk /dev/sdb: 17.2 GB, 17171480576 bytes
64 heads, 32 sectors/track, 16376 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
1
Disk /dev/sdb doesn't contain a valid partition table
1
2
pvcreate /dev/sdb
vgcreate iSCSI /dev/sdb
lvcreate iSCSI -n volume_name -l100%FREE
mkfs.ext4 /dev/iSCSI/volume_name

Add the logical volume to fstab

Make sure to use the mount option _netdev.  Without this option, Linux will try to mount this device before it loads network support.

1
vi /etc/fstab
/dev/mapper/iSCSI-volume_name    /mnt   ext4   _netdev  0 0

Success.

Convert xenserver 6.5 to software RAID 1

I have written previously about how to convert Citrix Xenserver 6.2 to a software RAID 1. When I upgraded to Xenserver 6.5 I found I had to re-install the xenserver instance because the upgrade didn’t recognize the software RAID. When trying to follow my own guide I found that I couldn’t create the array – it gave the following error message:

mdadm: unexpected failure opening /dev/md0

It turns out 6.5 handles RAID differently. You have to manually load the RAID kernel modules before you can create arrays. I was able to get this running successfully thanks to guidance from this site, specifically comments on it by Olli.

The majority of this can simply be copy/pasted into the command window, once drive paths have been updated for your specific setup.

# Prepare /dev/sdd
sgdisk --zap-all /dev/sdd
sgdisk --mbrtogpt --clear /dev/sdd
sgdisk -R/dev/sdd /dev/sdc # Replicate partion table from /dev/sdc to /dev/sdd with unique identifier
sleep 5 # Sleep 5 seconds here if you script this…
sgdisk --typecode=1:fd00 /dev/sdd
sgdisk --typecode=2:fd00 /dev/sdd
sgdisk --typecode=3:fd00 /dev/sdd
sleep 5 # Sleep 5 seconds here if you script this…
modprobe md_mod # load raid, because it isn't load by default (XS6.5 only)
yes|mdadm --create /dev/md0 --level=1 --raid-devices=2 --metadata=0.90 /dev/sdd1 missing # Create md0 (root)
yes|mdadm --create /dev/md1 --level=1 --raid-devices=2 --metadata=0.90 /dev/sdd2 missing # Create md0 (swap)
yes|mdadm --create /dev/md2 --level=1 --raid-devices=2 --metadata=0.90 /dev/sdd3 missing # Create md0 (storage)
sleep 5 # Sleep 5 seconds here if you script this…
mkfs.ext3 /dev/md0 # Create root FS
mount /dev/md0 /mnt # Mount root FS
cp -xR --preserve=all / /mnt # Replicate root files
mdadm --detail --scan > /mnt/etc/mdadm.conf #generate RAID configuration
sed -i 's/LABEL=[a-zA-Z\-]*/\/dev\/md0/' /mnt/etc/fstab # Update fstab for new RAID device
mount --bind /dev /mnt/dev
mount -t sysfs none /mnt/sys
mount -t proc none /mnt/proc
chroot /mnt /sbin/extlinux --install /boot
dd if=/mnt/usr/share/syslinux/gptmbr.bin of=/dev/sdd
chroot /mnt
mkinitrd -v -f --theme=/usr/share/splash --without-multipath /boot/initrd-`uname -r`.img `uname -r`
exit
sed -i 's/LABEL=[a-zA-Z\-]*/\/dev\/md0/' /mnt/boot/extlinux.conf # Update extlinux for new RAID device
cd /mnt && extlinux --raid -i boot/
sgdisk /dev/sdd --attributes=1:set:2

#Unmount filesystems and reboot
cd
umount /mnt/dev
umount /mnt/sys
umount /mnt/proc
umount /mnt
sync
reboot

Tell BIOS to use disk B
After reboot to disk B…

sgdisk -R/dev/sdc /dev/sdd # Replicate partition table from /dev/sdd to /dev/sdc with unique identifier
sgdisk /dev/sdc --attributes=1:set:2
sleep 5 # Sleep 5 seconds here if you script this…
mdadm -a /dev/md0 /dev/sdc1
mdadm -a /dev/md1 /dev/sdc2
mdadm -a /dev/md2 /dev/sdc3 # If this command gives error, you need to forget/destroy an active SR first
#This next command is the only command you have to manually update before pasting in. Find the UUID of your xenserver host and paste it between the <> below
xe sr-create content-type=user device-config:device=/dev/md2 host-uuid=<UUID of xenserver host> name-label="RAID 1" shared=false type=lvm
# Watch rebuild progress and wait until no arrays are rebuilding before proceeding with any reboot
watch “mdadm --detail /dev/md* | grep rebuild”

Done!

Fix Apache Permission Denied errors

The other day I ran the rsync command to migrate files from an old webserver to a new one. What I didn’t notice right away was that the rsync changed the permissions of the folder I was copying into.

The problem presented itself with a very lovely 403 forbidden error message when trying to access any website that server hosted. Checking the logs (/var/log/apache2/error.log on my Debian system) revealed this curious message:

[error] [client 192.168.22.22] (13)Permission denied: access to / denied

This made it look like apache was denying access for some reason. I verified apache config and confirmed it shouldn’t be denying anything. After some head scratching I came across this site which explained that Apache throws that error when it encounters filesystem access denied error messages.

I was confused because /var/www, where the websites live, had the appropriate permissions. After some digging I found that the culprit in my case was not /var/www, but rather the /var directory underneath /var/www. For some reason the rsync changed /var to not have any execute permissions (necessary for folder access.)  A simple

chmod o+rx /var/

resolved my problem. Next time you get 403 it could be underlying filesystem issues and not apache at all.

Remove and re-install a Debian package

I wanted to completely blow away owncloud on one of my Debian servers today. I did apt-get remove owncloud and removed the owncloud directory. That should be all I need to do, right? Alas, not so.

It turns out that a simple apt-get remove does not remove necessary files, and a simple apt-get install doesn’t restore those missing files. In order to completely blow away the package, you must use the purge command (thanks to this site for the inspiration.)

First, purge the package as well as related packages

sudo apt-get purge owncloud-*

Then, re-install the package with the –reinstall flag

sudo apt-get install --reinstall owncloud

Done! The package has come back anew with all necessary files.

Delete windows.old folder

Some time ago I upgraded my Windows Server 2012 machine to Windows Server 2012 R2. The upgrade was seamless and the server has hummed along just fine until recently, when it began running out of space.

Windirstat, a great little disk space usage reporting program, reported that the largest hog of space was the windows.old folder. Upon upgrade of the OS, the old Windows folder was renamed to Windows.old to make room for the new OS files and has sat there, untouched, ever since.

I tried to remove this folder with hilarious results. The folder is owned by TrustedInstaller. Easy enough, I’ll just replace the owner with my own user account, right? Wrong. Even after becoming the owner of the folder and everything inside it, I was prompted that I needed permission from… myself.. to delete the folder. I then tried changing the owner to “Everyone” and receive a rother comical message that I needed permission from Everyone to remove the folder. That would take some time!

everyone
You need permission from everyone.

That’s when I decided to throw in the towel and google. The solution to this problem involves the command line (thanks to here for the information.) Open an administrator command prompt and issue the following commands:

takeown /F c:\Windows.old\* /R /A /D Y
cacls c:\Windows.old\*.* /T /grant administrators:F
rmdir /S /Q c:\Windows.old

That did the trick! No more full disk.

Reclaim lost space in Xenserver 6.5

Storage XenMotion is awesome. It allows me to spin up a second Xenserver host and live migrate VMs to it whenever I need to do maintenance on my primary xenserver host. I don’t need an intermediary storage device such as a NAS – the two hosts can exchange live, running VMs directly. No downtime!

An unfortunate side effect of using Storage XenMotion is that sometimes it doesn’t clean itself up very well. It takes several snapshots in the migration process and they sometimes get “forgotten about.” This results in inexplicable low disk space errors such as this one:

The specified storage repository has insufficient space

..despite there being plenty of space.

This article explains how to use the coalesce option to reclaim space by issuing the following command:

xe host-call-plugin host-uuid=<host-UUID> plugin=coalesce-leaf fn=leaf-coalesce args:vm_uuid=<VM-UUID>

Unfortunately that didn’t seem to do anything for me. Digging into the storage underpinnings I can see that there are a lot of logical volumes hanging out there not being used:

xe vdi-list sr-uuid=<UUID of SR without space>

This revealed a lot of disks floating around in the SR that aren’t being used (I know this by looking at that same SR inside xencenter.) Curiously there is a VDI with identical names but with different UUIDs, despite my not having any snapshots of that VM.

I was about to start using the vgscan command to look for active volume groups when I got called away. Hours later, when I got back to my task, I found that all the space had been freed up. Xenserver had done its own garbage collection, albeit slowly. So, if you’ve tried to use xenmotion and found you have no space.. give xenserver some time. You might just find out that it will clean itself up.


Update 05/20/2015

I ran into this problem once more. I read from here that simply initiating a scan of the storage repository is all you need to do to reclaim lost space. Unfortunately when I ran that the scan nothing changed. A check of /var/log/SMlog revealed the following error (thanks to ap’s blog for the guidance)

SM: [30364] ***** sr_scan: EXCEPTION XenAPI.Failure, ['INTERNAL_ERROR', 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", "3e616c49-adee-44cc-ae94-914df0489803")']
...
Raising exception [40, The SR scan failed  [opterr=['INTERNAL_ERROR', 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", "3e616c49-adee-44cc-ae94-914df0489803")']]]

For some reason one of the ISOs in one of my SRs was throwing an error – specifically a Xenserver operating system fixup iso, which was causing the coalescing process to abort. I didn’t care if I lost that VDI so I nuked it:

xe vdi-destroy uuid="3e616c49-adee-44cc-ae94-914df0489803"

That got me a little father, but I still wasn’t seeing any free space. Further inspection of the log revealed this gem:

SMGC: [7088] No space to leaf-coalesce f8f8b129[VHD](20.000G/10.043G/20.047G|ao) (free
 space: 1904214016)

I read that if there isn’t enough space, a coalesce can’t happen on a running VM. I decided to shut down one of my VMs that was hogging space and run the scan again. This time there was progress in the logs. It took a while, but eventually my space was restored!

Moral of the story: if your server isn’t automatically coalescing to free up space, check /var/log/SMlog to see what’s causing it to choke.

Automatically delete old data in Splunk

I’ve had Splunk humming along for about two years now. I’ve already increased the storage space for my Splunk VM once. Today I received a notice that I’ve once again run out of space and indexing had been suspended. I wanted a more permanent solution to this problem, so I consulted the almighty Google.

The solution to my problem is to set a retirement policy. This allows Splunk to automatically delete old data when you hit a certain index size. You can also go by time, but I opted for size. It’s a pretty simple configuration change. Simply edit (or create if it doesn’t exist) $SPLUNK_HOME/etc/system/local/index.conf and add two lines

sudo vim /opt/splunk/etc/system/local/indexes.conf
[main]
maxTotalDataSizeMB = 50000

Then, restart your Splunk instance (the command below is Debian/Ubuntu specific)

sudo service splunk restart

The configuration above tells Splunk to keep at most 50GB of data. When that limit is reached, it begins deleting the oldest log files. No more out of space errors!

Install ventrilo on Ubuntu 14.04 64bit

Ventrilo is a voice communication server which is popular in the gaming community. It allows teams of people to get together and have voice chats. I recently tried to install vent on a 64bit instance of Ubuntu 14.04. When I tried to execute the server binary, I was greeted with this lovely error message:

bash: ./ventrilo_srv: No such file or directory

It’s a pretty cryptic error message that had me chasing my tail for a while until I came across this post which shed further light on the issue. This error stems from trying to run a 32bit binary on a 64bit system without the proper libraries installed.

A simple

sudo apt-get install lib32z1

Resolved this issue. After those 32bit libraries were installed, vent ran without issue.