Convert xenserver installation to software RAID-1

Update 2/28/2015:  I have a newer article explaining how to do this in Xenserver 6.5.


 

After having a hard drive nearly die on me and threaten to obliterate the VMs living on it I realized it would be a good idea to have my xenserver installation live on a RAID array.

Following this guide I was able to successfully migrate my running xenserver installation to a software based RAID 1, with a few tweaks. In my case I wanted to migrate from a single old drive to two newer ones.

Below are the steps I took to accomplish this.

Partition the new drives

This assumes that your current drive resides on /dev/sda, and your two new drives are /dev/sdb and /dev/sdc.

sgdisk -p /dev/sda
sgdisk --zap-all /dev/sdb
sgdisk --zap-all /dev/sdc
sgdisk --mbrtogpt --clear /dev/sdb
sgdisk --mbrtogpt --clear /dev/sdc
sgdisk --new=1:34:8388641 /dev/sdb
sgdisk --new=1:34:8388641 /dev/sdc
sgdisk --typecode=1:fd00 /dev/sdb
sgdisk --typecode=1:fd00 /dev/sdc
sgdisk --attributes=1:set:2 /dev/sdb
sgdisk --attributes=1:set:2 /dev/sdc
sgdisk --new=2:8388642:16777249 /dev/sdb
sgdisk --new=2:8388642:16777249 /dev/sdc
sgdisk --typecode=2:fd00 /dev/sdb
sgdisk --typecode=2:fd00 /dev/sdc

The third partition (VM storage) had to be tweaked a bit since these are larger drives than the current xenserver installation. I simply used gdisk instead of sgdisk for this task.

gdisk /dev/sdb
n #create new partition
<enter> #accept defaults for partition number, first, and last sectors
<enter>
<enter>
t #select partition type
3 #select partition number 3
fd00  #set for raid
w   #write changes to disk

Repeat above steps for the other disk (/dev/sdc in my case)

Create the RAID arrays for each partition

mdadm --create /dev/md0 --level=1 --raid-devices=2  /dev/sdb1 /dev/sdc1
mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sdb2 /dev/sdc2
mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/sdb3 /dev/sdc3

Watch array build (optional)

cat /proc/mdstat

Alternatively you can use the watch command to get a real time update of the raid build:

watch -n 1 cat /proc/mdstat

Format & mount the array

mkfs.ext3 /dev/md0
mount /dev/md0 /mnt

Copy the root filesystem to the new array

cp -vxpr / /mnt

Install bootloader on the new disks

mount --bind /dev /mnt/dev
mount -t sysfs none /mnt/sys
mount -t proc none /mnt/proc
chroot /mnt /sbin/extlinux --install /boot
dd if=/mnt/usr/share/syslinux/gptmbr.bin of=/dev/sdb
dd if=/mnt/usr/share/syslinux/gptmbr.bin of=/dev/sdc

Generate new initrd image

chroot /mnt
mkinitrd -v -f --theme=/usr/share/splash --without-multipath /boot/initrd-`uname -r`.img `uname -r`
exit

Modify boot file

Edit /mnt/boot/extlinux.conf and replace every mention of the old root filesystem (root=LABEL=xxx) with root=/dev/md0.

vi /mnt/boot/extlinux.conf
:%s/LABEL=<root label>/\/dev\/md0/
:wq

Reboot

Keep the old drive in, but make sure to boot from either one of the member drives of your new array.

Create storage repository

Create new local storage repository with the new RAID array similar to here.

xe sr-create content-type=user device-config:device=/dev/md2 host-uuid=<UUID of xenserver host> name-label="RAID-1" shared=false type=lvm

Migrate VMs / disks

Migrate any disk images or VMs living on the old drive to the new array.

If these VMs / disks are not powered on or being used, it is as simple as pulling up xencenter, right clicking on the VM and clicking move then  select new storage repository.

If the VMs are online you can live migrate them to a different xenserver, then live migrate them back to the proper storage repository.

Remove old storage repository

Following instructions found here.
Note: In my case the transfer returned a strange error but was still successful. I had to restart the XAPI toolstack in order for it to let me remove the old storage repository.

xe sr-list name-label="<name of SR to remove>"
xe pbd-list sr-uuid=<UUID of SR above>
xe pbd-unplug uuid=<UUID of pbd above>
xe sr-forget uuid=<UUID of SR>

Final reboot

Shutdown, disconnect the old drive, and boot back up from the new array. Success.

Configure e-mail alerts (optional)

Now that you have a working RAID array you might want to receive e-mail alerts if there are problems with the array.

First, build an mdadm.conf

mdadm --detail --scan > /etc/mdadm.conf

Modify mdadm.conf to add your desired e-mail address for notifications

sed -i '1i MAILADDR <e-mail address>' /etc/mdadm.conf

Thanks to this site for the sed -i 1i trick.

Lastly, enable the mdadm monitoring service. I found via this site that this is fairly easy to do.  Simply enter these two commands:

service mdmonitor start
chkconfig mdmonitor on

Xenserver uses ssmtp to send e-mail. You can follow this guide on how to set it up for SSL if you happen to have an ISP that blocks port 25 (as I do.) Otherwise modify /etc/ssmtp/ssmtp.conf to suit your needs.

You can generate a test event from mdadm to make sure e-mail is configured properly:

mdadm --monitor --test /dev/md0 --oneshot

To get e-mail alerts to work right I had to ensure that FromLineOverride was NOT set to yes (default). I also had to add this line to /etc/ssmtp/revaliases:

root:<e-mail address being sent from>


Update 02/03/2015:  A commenter made me realize I forgot a step – copying the Control Domain OS to the new Raid array. I’ve added that step above, after the “Format & Mount the array” section.

Update 02/17/2015: If you are using Xenserver 6.5 you might come across the following error message when trying to create RAID arrays:

mdadm: unexpected failure opening /dev/md0

If this happens, load the md kernel driver like so:

modprobe md

It should then let you create your arrays.

Manipulate dd output with sed, tr, and awk

While improving a backup script I came across a need to modify the information output by the dd command. FreeNAS, the system the script runs on, does not have a version of dd that has the human readable option. My quest: make the output from dd human readable.

I’ve only partially fulfilled this quest at this point, but the basic functionality is there. The first step is to redirect all output of the dd command (both stdout and stderr) to a variable (this particular syntax is for bash)

DD_OUTPUT=$(dd if=/dev/zero of=test.img bs=1000000 count=1000 2>&1)

The 2>&1 redirects all output to the variable instead of the console.

The next step is to use sed to remove unwanted information (number records in and out)

sed -r '/.*records /d'

Next, remove any parenthesis. This is due to the parenthesis from dd output messing up the math that’s going to be done later.

tr -d '()'

-d ‘()’ simply tells tr to remove any instance of the characters between the single quotes.

And now, for awk. Awk allows you to manipulate certain pieces of a text file. Each section, separated by a space, is known as a record. Awk lets us edit some parts of text while keeping others intact. My expertise with awk is that of a n00b so I’m sure there is a more efficient way of doing this; nevertheless here is my solution:

awk '{print ($1/1024)/1024 " MB" " " $3 " " $4 " " $5*60 " minutes"  " (" ($7/1024)/1024 " MB/sec)" }'

Since dd outputs its information in bytes I’m having it divide by 1024 twice so the resulting number is now in megabytes. I also have it divide the seconds by 60 to return the number of minutes dd took. Additionally I’m re-inserting the parenthesis removed by the tr command now that the math is correctly done.

The last step is to pipe all of this together:

DD_OUTPUT=$(dd if=/dev/zero of=test.img bs=1000000 count=10000 2>&1)
DD_HUMANIZED=$(echo "$DD_OUTPUT" | sed -r '/.*records /d' | tr -d '()' | awk '{print ($1/1024)/1024 " MB" " " $3 " " $4 " " $5/60 " minutes"  " (" ($7/1024)/1024 " MB/sec)" }')

After running the above here are the results:

Original output:

echo "$DD_OUTPUT"
10000+0 records in
10000+0 records out
10000000000 bytes transferred in 103.369538 secs (96740299 bytes/sec)

Humanized output:

echo "$DD_HUMANIZED"
9536.74 MB transferred in 1.72283 minutes (92.2587 MB/sec)

The printf function of awk would allow us to round the calculations but I couldn’t quite get the syntax to work and have abandoned the effort for now.

Of course all this is not necessary if you have the correct dd version. The GNU version of dd defaults to human readability; certain BSD versions have the option of passing the msgfmt=human argument per here.


Update: I discovered that the awk method above will only print the one line it finds and ignore all other lines, which is less than ideal for scripting. I updated the awk syntax to do a search and replace (sub) instead so that it will print all other lines as well:

awk '{sub(/.*bytes /, $1/1024/1024" MB "); sub(/in .* secs/, "in "$5/60" mins "); sub(/mins .*/, "mins (" $7/1024/1024" MB/sec)"); printf}')

My new all in one line is:

DD_HUMANIZED=$(echo "$DD_OUTPUT" | sed -r '/.*records /d' | tr -d '()' | awk '{sub(/.*bytes /, $1/1024/1024" MB "); sub(/in .* secs/, "in "$5/60" mins "); sub(/mins .*/, "mins (" $7/1024/1024" MB/sec)"); printf}')

Verify backup integrity with rsync, sed, cat, and tee

Recently it became apparent that a large data transfer I did might have had some errors. I wanted to find an easy way to compare the source and destination to make sure that they were identical. My solution: rsync, sed, cat and tee

I have used rsync quite a bit but did not know about the –checksum flag until recently. When you run rsync with –checksum, it takes much longer, but it effectively does something similar to what a ZFS scrub does – it runs a checksum of every source file and compares it with the checksum of each destination file.  If there is a mismatch, rsync will overwrite the destination file with the source file to correct it.

In my situation I performed a large data migration from my old mdadam-based RAID array to my brand new ZFS array. During the transfer the disks were acting very strange, and at one point one of the disks even popped out of the array. The culprit turned out to be a faulty SATA controller. I bought a cheap 4 port SATA controller from Amazon for my new ZFS array. Do not do this! Spring the cash out for a better controller. The cheap ones, this one at least, only caused headache. I removed it and used the on-board SATA ports on my motherboard and the issues went  away.

All of those shennanigans made me wonder if there was corrupt data on my new ZFS array. A ZFS scrub repaired 15.5G of data! While I’m sure that fixed a lot of the issues, I realized there probably was still some corruption. This is how I verified it

rsync -Pahn --checksum /path/to/source /path/to/destination | tee migration.txt

-P shows progress, -a means archive, -h is for human readable measurements, and -n means dry run (don’t actually copy anything)

Tee is a cool utility that allows you to redirect output of a command both to a file and to standard output. This is useful if you want to see the verification take place in real time but also want to analyze it later.

After the comparison (which took a while!) I wanted to see the discrepancies. the -P flag lists each directory rsync checks as well as which files it detected. You can use sed in conjunction with cat to weed out the unwanted lines (directory listings) so that only the files with discrepancies are left.

 cat pictures.txt | sed '/\/$/d' | tee pictures-truncated.txt

The sed regex simply looks for any line ending in a / (directory listing) and removes that line. What is left is the files in question. You can combine the entire thing into one line like so

rsync -Pahn --checksum /path/to/source /path/to/destination | sed '/\/$/d' | tee migration.txt

In my case I wanted to compare discrepencies with rsync and make decisions on if I wanted to actually fix the issues. If you are 100% sure the source is OK to remove the destination completely, you can simply run

rsync -Pah --checksum --delete /path/to/source /path/to/destination

ZFS remote replication script with reporting

In my experimentation with FreeNAS one thing I found lacking was the quality of reports it generated. I suppose one philosophy is that the smaller the e-mail the better, but my philosophy is that the e-mail should still be legible. Some of the e-mails I get from FreeNAS are simply bizarre and cryptic.

FreeNAS has an option to replicate your ZFS volumes to a remote source for backup. As far as I can tell there is no report e-mail when the replication is done, although there may be a cryptic e-mail if anything failed. I have grown used to daily status e-mails from my previous NAS solution (Debian with homegrown scripts.) I set out to do this with FreeNAS and added a few added features along the way.

My script requires that you have already created an appropriate user and private/public key pair for both the source and destination machines (to allow for passwordless logins.) Instructions on how to do this are detailed below. You can download the script here.

Notes and observations

I learned quite a bit when creating this script. The end result is a script that e-mails me a beautiful report telling me anything that was added or removed since the last backup.

  • I used dd for greater speed as suggested here
  • I learned from here that the -R switch for ZFS send sends the entire snapshot tree.
  • The ZFS diff command currently has a bug where it does not always report deleted files / folders. It was opened two years ago, closed, and then recently re-opened as it is still an issue. It is the reason my script uses both ZFS dff and rsync – so I can continually see the bug in action.
  • When dealing with rsync, remember the / at the end!
  • In bash you can pipe output from a command to a variable.
  • When echoing above variable, make sure you enclose it in quotes to preserve formatting.
  • Use the -r flag in sed -r for extended regex functions
  • In my testing the built in freeNAS replication script didn’t appear to replicated the latest snapshot. Interesting…

Below are the preliminary steps that are needed in order for the script to run properly.

Configure a user for replication

Create users

Either manually or through the FreeNAS UI, create a user that will run the backup script. Create that same user on the remote box (backup server.)

Generate RSA keys

Log into local host and generate RSA keys to allow for passwordless login to the system

cd .ssh
ssh-keygen

Make note of the filenames you gave it (the default is id-rsa and id-rsa.pub)

Authorize the resulting public key

Log into remote host and add the public key of local host in ~/username/.ssh/authorized_keys where username is the user you created above. One way to accomplish this is to copy the public key on the main server and paste it into the authorized keys file of the backup server.

On the main server

(assuming the keyfile name is id-rsa)

cd .ssh
less id-rsa.pub

Copy the output on the screen in its entirety

On the backup server

Paste that public key into the authorized_keys file of the backup user

cd .ssh
vi authorized_keys

Allow the new user to mount filesystems

FreeNAS requires you to specifically allow regular users to mount filesystems as described here.

  1. In the web interface under System > Sysctls > Add sysctl:
    Variable: vfs.usermount
    Value: 1
    Enabled: yes

Grant ZFS permissions to the new user

In order for the dataset creation (full backup) feature to work the user we’ve created needs to have specific ZFS permissions granted as outlined here.

Run this command on both the main and backup servers:

zfs allow backup create,destroy,snapshot,rollback,clone,promote,rename,mount,send,receive,quota,reservation,hold storage

where backup is the new user and storage is the dataset name. I’m pretty sure you can make those permissions a little more fine grained but I threw a bunch of them in there for simplicity’s sake.

Configure HP iLo (optional)

My current backup server is an old HP Proliant server equipped with HP iLo. I decided to add a section in my script that, when enabled in the variables section, would have the script use iLo to power the machine on. If you do not have / wish to use iLo to control your backup server you can skip this section.

First, create a user in ILo and grant it Virtual Power and Reset permissions (all the rest can be denied.)

Next, copy the .pub file you created earlier to your computer so you can go into iLo web interface and upload it. Make sure an iLo user exists and the last part  (the username) of the public key file matches exactly with the user you created in HP iLo.

When I first tried this no matter what I tried I couldn’t get passwordless login to work. After much weeping, wailing, and gnashing of teeth. I finally discovered from here that the -f and -C options of the ssh-keygen command are required for iLo to accept the key. I had to regenerate a private/public key pair via the following options, where backup is the user I created in iLo:

ssh-keygen -b 1024 -f backup -C backup

Refresh owncloud file cache

I came across an issue with owncloud where I had manually placed files in my user directory but the files were not showing up in owncloud. I found from here that you can access the owncloud console directly and trigger a re-scan of your files.

To trigger a re-scan, open up a terminal session to your owncloud server and run the following command:

php /path/of/owncloud/console.php files:scan --all

This will trigger a re-scan of all files for all users. You can replace –all with a userid if you just want to scan a specific user’s folder instead.

Xenserver – The uploaded patch file is invalid

It has been six months since I’ve applied any patches to my Citrix Xenserver hypervisor. Shame on me for not checking for updates. The thing has been humming along without any issues so it was easy to forget about.

In trying to install xenserver patches today I kept getting this error message no matter what I tried:

The uploaded patch file is invalid

After deleting everything I could (including files hanging out in /var/patch) I realized that I was simply Doing It Wrong™. D’oh!

When applying xenserver updates, the expected file extension is .xsupdate. I had been trying to xe patch-upload the downloaded zip file, whereas I was supposed to have extracted those zips before trying to upload them.  This quick little line unzipped all my patch ZIP files for me in one swoop:

find *.zip -exec unzip {} \;

Once everything was unzipped I was able to upload and apply the resulting .xsupdate files without issue.

Find out free disk space from the command line

The du command can be used in any Unix shell to determine how much space a folder is taking. It’s a quick way to determine which directory is using the most space.

If you run du on the root drive “/” then you will get an idea of how much space the drive is using. One unfortunate side effect of that command is if you have any mounted drives or other filesystems, it will search the disk usage of those folders as well. Aside from taking a long time that method provides data we don’t necessarily want.

Fortunately, there are a few switches you can use to fix this problem. The -x switch tells du to ignore other filesystems. Perfect.

There are a few other switches that prove useful. Below is a list of my favorites:

  • -x Ignore other filesystems
  • -h Use human readable numbers (kilobytes, megabytes, and gigabytes) instead of raw bytes
  • -s Provide a summary of the folder instead of listing the size of each file inside the folder
  • * (not a switch, but rather an argument to put after your directory.) This summarizes subfolders in that directory, as opposed to simply returning the size of that directory.

For troubleshooting low disk space errors, the following command will give you a good place to start:

du -hsx /*

You can then dig further by altering the above command to reflect a directory instead of /, or simply do * if you want a breakdown of the directory you are currently in.

Compare two latest ZFS snapshots for differences

In my previous post about ZFS snapshots I discussed how to get the latest snapshot name. I came across a need to get the name of the second to last snapshot and then compare that with the latest. A little CLI kung-fu is required for this but nothing too scary.

The command of the day is: zfs diff.

zfs diff storage/mythTV@auto-20141007.1248-2h storage/mythTV@auto-20141007.1323-2h

If you get an error using zfs diff, you aren’t running as root. You will need to delegate the diff ZFS permission to the account you’re using:

zfs allow backup diff storage

where backup is the account you want to grant permissions for and storage is the dataset you want to grant permissions to.

The next step is to grab the two latest snapshots using the following commands.

Obtain latest snapshot:

zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1

Obtain the second to latest snapshot:

zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1

Putting it together in one line:

zfs diff `zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1` `zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1`

While doing some testing I came across an unfortunate bug with the ZFS diff function. Sometimes it won’t show files that have been deleted! It indicates that the folder where the deleted files were in was modified but doesn’t specify any further. This bug appears to affect all ZFS implementations per here and here. As of this writing there has been no traction on this bug. The frustrating part is the bug is over two years old.

The workaround for this regrettable bug is to use rsync  with the -n parameter to compare snapshots. -n indicates to only do a dry run and not actually try to copy anything.

To use Rsync for comparison, you have to do a little more CLI-fu to massage the output from the zfs list command so it’s acceptable to rsync as well as include the full mountpoint of both snapshots. When working with rsync, don’t forget the trailing slash.

rsync -vahn --delete /mnt/storage/Documents/.zfs/snapshot/`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1 | sed 's/.*@//g'`/ /mnt/storage/Documents/.zfs/snapshot/`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1 | sed 's/.*@//g'`/

Command breakdown:

Rsync arguments:
-v means verbose (lists files added/deleted)
-a means archive (preserve permissions)
-h means human readable numbers
-n means do a dry run only (no writing)
–delete will delete anything in the destination that’s not in the source (but not really since we’re doing -n – it will just print what it would delete on the screen)

Sed arguments
/s search and replace
/.*@ simple regex meaning anything up to and including the @ sign
/  What comes after this slash is what we would like to replace what was matched in the previous command. In this case, we choose nothing, and move directly to the last argument
/g tells sed to keep looking for other matches (not really necessary if we know there is only one in the stream)

All these backticks are pretty ugly, so for readability sake, save those commands into variables instead. The following is how you would do it in bash:

FIRST_SNAPSHOT="`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1 | sed 's/.*@//g'/`"
SECOND_SNAPSHOT="`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1 | sed 's/.*@//g'/`"
rsync -vahn --delete /mnt/storage/Documents/.zfs/snapshot/$FIRST_SNAPSHOT /mnt/storage/Documents/.zfs/snapshot/$SECOND_SNAPSHOT

I think I’ll stop for now.

Blow away ZFS snapshots and watch the progress

For the last month I have had a testing system (FreeNAS) take ZFS snapshots of sample datasets every five minutes. As you can imagine, the snapshot count has risen quite dramatically. I am currently at over 12,000 snapshots.

In testing a backup script I’m working on I’ve discovered that replicating 12,000 snapshots takes a while. The initial data transfer completes in a reasonable time frame but copying each subsequent snapshot takes more time than the original data. Consequently, I decided to blow away all my snapshots. It took a while! I devised this fun little way to watch the progress.

Open two terminal windows. In terminal #1, enter the following:

bash
while [ true ]; do zfs list -H -t snapshot | wc -l; sleep 6; done

The above loads BASH and the runs a simple loop to count the total number of snapshots on the system. The sleep command is only there because it takes a few seconds to return the results when you have more than 10,000 snapshots.

Alternatively you could make the output a little prettier by entering the following:

while [ true ]; do REMAINING="`zfs list -H -t snapshot | wc -l`"; echo "Snapshots remaining: $REMAINING" ; sleep 6; done

In terminal #2, enter the following (taken from here):

bash
for snapshot in `zfs list -H -t snapshot | cut -f 1`
do
zfs destroy $snapshot
done

You can now hide terminal#2 and observe terminal #1. It will show you how many snapshots are left, refreshing the number every 6 seconds. Neat.

Fix Apache “Could not reliably determine name” error

For too many years now I have been too lazy to investigate the Apache error message I get whenever I restart the service:

 ... waiting apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName

I finally decided to investigate it today and found this post which describes a simple fix: create /etc/apache2/conf.d/name and add the ServerName variable to it.

sudo vim /etc/apache2/conf.d/name
ServerName jeppson.org

Change ServerName to be whatever you would like, and you’re good to go.