Tag Archives: ZFS

Create a RaidZ array with missing drive in FreeNAS

I came across a need to create a ZFS Raid-Z array with a missing drive. This is easy to do with mdadm but not as easy with ZFS. It is possible, though. The trick is to create an image file with dd, then map that image file as a loopback device. Once that’s done you can treat it as if it were a regular hard drive and add it to the array. Once added to the array you can take the loopback device offline and remove it from the array, then add an actual HDD later.

Create loopback device

Thanks to this site for the information.

First, use dd to create an image file. Change the seek parameter to whatever size disk you wish to emulate.

dd if=/dev/zero of=temp.img bs=1 count=1 seek=1024G

Next, initialize the loopback driver and create the loopback device (md0 in my case)

sudo losetup -a
sudo mdconfig -a -t vnode -f temp.img -u 0

List your loopback devices with the following command to verify your new loopback device:

sudo mdconfig -l

Create array using loopback device

You can now partition and add your loopback device as if it were a regular hard drive. Change volume name, array name, and device names as necessary for your environment.

sudo gpart add -t freebsd-zfs -l <volume name> md0
sudo zpool create <array name> raidz ada7p1 ada8p1 ada9p1 md0p1

Fail & Remove loopback device

Now that our new array is up and running properly we can fail out the loopback device. Make sure to modify the command to use your array name and loopback device/partition number.

sudo zpool offline <array_name> md0p1

Import new array into FreeNAS GUI

To get our new array in freenas we must export the array from the command line, then import it from the GUI.

sudo zpool export <array name>

Once the array has been exported, navigate to the FreeNAS GUI and go to Storage / Volumes / Import Volume.

You should now have your new array minus one drive ready to go in FreeNAS. You can now add a physical HDD when it becomes available (in my case, when it returns from RMA.)

FreeNAS unable to create jails fix

I recently got a shiny new FreeNAS Mini appliance. It’s the bee’s knees. Previously I was using a virtualized instance of FreeNAS that has served me admirably for two years now. During the migration I decided to start fresh with the jails configuration I had and deleted the entire jails dataset. This turned out to be a mistake. I suddenly found out that I couldn’t create any jails or plugins. The plugin download would hang for a long time and flash a brief message “Failed to download plugin.” Not helpful.

I tried changing the location of my jails in configuration to no avail. I even tried nuking my FreeNAS config entirely and starting from scratch. The error still happened! Somehow that configuration survived a factory restore.

I finally found this freenas forum entry that pointed me in the right direction. It suggested I use the warden command to delete the plugin jail template completely and re-install it. When I tried to I got this error:

 

[nicholas@freenas ~]$ sudo warden template delete pluginjail
ERROR: Not a ZFS volume: /mnt/storage/jails/.warden-template-pluginjail

It was still trying to install the plugin template in my non-existent dataset. I decided to try re-creating the missing dataset and then running the warden delete command again. Success!

[nicholas@freenas ~]$ sudo zfs create storage/jails/.warden-template-pluginjail
[nicholas@freenas ~]$ sudo warden template delete pluginjail

Once you delete the template jail via warden, you can re-create it in the right place after configuring the correct path in Jails / Configuration. Once you have the right place configured, issue the following:

warden template create -nick pluginjail -tar http://download.freenas.org/jails/9.3/x64/freenas-pluginjail-9.3-RELEASE.tgz

Plugins and jails work again! Success.

Improve FreeNAS NFS performance in Xenserver

My home lab consists of a virtualized instance of freenas, Citrix Xenserver, and various VMs. Recently I wanted to migrate some of my VMs to an NFS export from FreeNAS. To my dismay, the speed was abysmal (3 MB/second write speeds.) This tutorial will walk you through how to improve FreeNAS NFS performance in Xenserver by adding an log device (ZIL) to your ZFS pool.

After much research I realized the problem lies with ZFS behind the NFS export. Xenserver mounts the NFS share in such a way that it constantly wants to synchronize writes, which slows things down.

The solution: add a ZIL device. Since my freeNAS is virtualized, I chose the route of adding a virtual disk that is attached to an SSD. This process wasn’t straightforward.  If you have a virtual FreeNAS this is how to improve NFS performance:

  1. Add a disk in xenserver. Rule of thumb for size is half the amount of system RAM. I added 16GB ZIL disk to be safe.
  2. Add the following tunables in FreeNAS (to allow the OS to properly see xen hard drives)
    1. hint.ada.0.at, scbus100 (for the FreeNAS OS disk)
    2. hint.ada.1.at, scbus100 (for the newly added ZIL disk)
  3. Reboot FreeNAS
  4. In the FreeNAS GUI, click the ZFS Volume Manager, select your volume to expand from the dropdown, and select the device to be a LOG volume (ZIL)

That’s it! Once I added an SSD based ZIL device for my ZFS pool, NFS writes went from 3 MB/s to 60 MB/s. Awesome.

Verify backup integrity with rsync, sed, cat, and tee

Recently it became apparent that a large data transfer I did might have had some errors. I wanted to find an easy way to compare the source and destination to make sure that they were identical. My solution: rsync, sed, cat and tee

I have used rsync quite a bit but did not know about the –checksum flag until recently. When you run rsync with –checksum, it takes much longer, but it effectively does something similar to what a ZFS scrub does – it runs a checksum of every source file and compares it with the checksum of each destination file.  If there is a mismatch, rsync will overwrite the destination file with the source file to correct it.

In my situation I performed a large data migration from my old mdadam-based RAID array to my brand new ZFS array. During the transfer the disks were acting very strange, and at one point one of the disks even popped out of the array. The culprit turned out to be a faulty SATA controller. I bought a cheap 4 port SATA controller from Amazon for my new ZFS array. Do not do this! Spring the cash out for a better controller. The cheap ones, this one at least, only caused headache. I removed it and used the on-board SATA ports on my motherboard and the issues went  away.

All of those shennanigans made me wonder if there was corrupt data on my new ZFS array. A ZFS scrub repaired 15.5G of data! While I’m sure that fixed a lot of the issues, I realized there probably was still some corruption. This is how I verified it

rsync -Pahn --checksum /path/to/source /path/to/destination | tee migration.txt

-P shows progress, -a means archive, -h is for human readable measurements, and -n means dry run (don’t actually copy anything)

Tee is a cool utility that allows you to redirect output of a command both to a file and to standard output. This is useful if you want to see the verification take place in real time but also want to analyze it later.

After the comparison (which took a while!) I wanted to see the discrepancies. the -P flag lists each directory rsync checks as well as which files it detected. You can use sed in conjunction with cat to weed out the unwanted lines (directory listings) so that only the files with discrepancies are left.

 cat pictures.txt | sed '/\/$/d' | tee pictures-truncated.txt

The sed regex simply looks for any line ending in a / (directory listing) and removes that line. What is left is the files in question. You can combine the entire thing into one line like so

rsync -Pahn --checksum /path/to/source /path/to/destination | sed '/\/$/d' | tee migration.txt

In my case I wanted to compare discrepencies with rsync and make decisions on if I wanted to actually fix the issues. If you are 100% sure the source is OK to remove the destination completely, you can simply run

rsync -Pah --checksum --delete /path/to/source /path/to/destination

ZFS remote replication script with reporting

In my experimentation with FreeNAS one thing I found lacking was the quality of reports it generated. I suppose one philosophy is that the smaller the e-mail the better, but my philosophy is that the e-mail should still be legible. Some of the e-mails I get from FreeNAS are simply bizarre and cryptic.

FreeNAS has an option to replicate your ZFS volumes to a remote source for backup. As far as I can tell there is no report e-mail when the replication is done, although there may be a cryptic e-mail if anything failed. I have grown used to daily status e-mails from my previous NAS solution (Debian with homegrown scripts.) I set out to do this with FreeNAS and added a few added features along the way.

My script requires that you have already created an appropriate user and private/public key pair for both the source and destination machines (to allow for passwordless logins.) Instructions on how to do this are detailed below. You can download the script here.

Notes and observations

I learned quite a bit when creating this script. The end result is a script that e-mails me a beautiful report telling me anything that was added or removed since the last backup.

  • I used dd for greater speed as suggested here
  • I learned from here that the -R switch for ZFS send sends the entire snapshot tree.
  • The ZFS diff command currently has a bug where it does not always report deleted files / folders. It was opened two years ago, closed, and then recently re-opened as it is still an issue. It is the reason my script uses both ZFS dff and rsync – so I can continually see the bug in action.
  • When dealing with rsync, remember the / at the end!
  • In bash you can pipe output from a command to a variable.
  • When echoing above variable, make sure you enclose it in quotes to preserve formatting.
  • Use the -r flag in sed -r for extended regex functions
  • In my testing the built in freeNAS replication script didn’t appear to replicated the latest snapshot. Interesting…

Below are the preliminary steps that are needed in order for the script to run properly.

Configure a user for replication

Create users

Either manually or through the FreeNAS UI, create a user that will run the backup script. Create that same user on the remote box (backup server.)

Generate RSA keys

Log into local host and generate RSA keys to allow for passwordless login to the system

cd .ssh
ssh-keygen

Make note of the filenames you gave it (the default is id-rsa and id-rsa.pub)

Authorize the resulting public key

Log into remote host and add the public key of local host in ~/username/.ssh/authorized_keys where username is the user you created above. One way to accomplish this is to copy the public key on the main server and paste it into the authorized keys file of the backup server.

On the main server

(assuming the keyfile name is id-rsa)

cd .ssh
less id-rsa.pub

Copy the output on the screen in its entirety

On the backup server

Paste that public key into the authorized_keys file of the backup user

cd .ssh
vi authorized_keys

Allow the new user to mount filesystems

FreeNAS requires you to specifically allow regular users to mount filesystems as described here.

  1. In the web interface under System > Sysctls > Add sysctl:
    Variable: vfs.usermount
    Value: 1
    Enabled: yes

Grant ZFS permissions to the new user

In order for the dataset creation (full backup) feature to work the user we’ve created needs to have specific ZFS permissions granted as outlined here.

Run this command on both the main and backup servers:

zfs allow backup create,destroy,snapshot,rollback,clone,promote,rename,mount,send,receive,quota,reservation,hold storage

where backup is the new user and storage is the dataset name. I’m pretty sure you can make those permissions a little more fine grained but I threw a bunch of them in there for simplicity’s sake.

Configure HP iLo (optional)

My current backup server is an old HP Proliant server equipped with HP iLo. I decided to add a section in my script that, when enabled in the variables section, would have the script use iLo to power the machine on. If you do not have / wish to use iLo to control your backup server you can skip this section.

First, create a user in ILo and grant it Virtual Power and Reset permissions (all the rest can be denied.)

Next, copy the .pub file you created earlier to your computer so you can go into iLo web interface and upload it. Make sure an iLo user exists and the last part  (the username) of the public key file matches exactly with the user you created in HP iLo.

When I first tried this no matter what I tried I couldn’t get passwordless login to work. After much weeping, wailing, and gnashing of teeth. I finally discovered from here that the -f and -C options of the ssh-keygen command are required for iLo to accept the key. I had to regenerate a private/public key pair via the following options, where backup is the user I created in iLo:

ssh-keygen -b 1024 -f backup -C backup

Compare two latest ZFS snapshots for differences

In my previous post about ZFS snapshots I discussed how to get the latest snapshot name. I came across a need to get the name of the second to last snapshot and then compare that with the latest. A little CLI kung-fu is required for this but nothing too scary.

The command of the day is: zfs diff.

zfs diff storage/mythTV@auto-20141007.1248-2h storage/mythTV@auto-20141007.1323-2h

If you get an error using zfs diff, you aren’t running as root. You will need to delegate the diff ZFS permission to the account you’re using:

zfs allow backup diff storage

where backup is the account you want to grant permissions for and storage is the dataset you want to grant permissions to.

The next step is to grab the two latest snapshots using the following commands.

Obtain latest snapshot:

zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1

Obtain the second to latest snapshot:

zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1

Putting it together in one line:

zfs diff `zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1` `zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1`

While doing some testing I came across an unfortunate bug with the ZFS diff function. Sometimes it won’t show files that have been deleted! It indicates that the folder where the deleted files were in was modified but doesn’t specify any further. This bug appears to affect all ZFS implementations per here and here. As of this writing there has been no traction on this bug. The frustrating part is the bug is over two years old.

The workaround for this regrettable bug is to use rsync  with the -n parameter to compare snapshots. -n indicates to only do a dry run and not actually try to copy anything.

To use Rsync for comparison, you have to do a little more CLI-fu to massage the output from the zfs list command so it’s acceptable to rsync as well as include the full mountpoint of both snapshots. When working with rsync, don’t forget the trailing slash.

rsync -vahn --delete /mnt/storage/Documents/.zfs/snapshot/`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1 | sed 's/.*@//g'`/ /mnt/storage/Documents/.zfs/snapshot/`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1 | sed 's/.*@//g'`/

Command breakdown:

Rsync arguments:
-v means verbose (lists files added/deleted)
-a means archive (preserve permissions)
-h means human readable numbers
-n means do a dry run only (no writing)
–delete will delete anything in the destination that’s not in the source (but not really since we’re doing -n – it will just print what it would delete on the screen)

Sed arguments
/s search and replace
/.*@ simple regex meaning anything up to and including the @ sign
/  What comes after this slash is what we would like to replace what was matched in the previous command. In this case, we choose nothing, and move directly to the last argument
/g tells sed to keep looking for other matches (not really necessary if we know there is only one in the stream)

All these backticks are pretty ugly, so for readability sake, save those commands into variables instead. The following is how you would do it in bash:

FIRST_SNAPSHOT="`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -2 | sort -r | tail -1 | sed 's/.*@//g'/`"
SECOND_SNAPSHOT="`zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1 | sed 's/.*@//g'/`"
rsync -vahn --delete /mnt/storage/Documents/.zfs/snapshot/$FIRST_SNAPSHOT /mnt/storage/Documents/.zfs/snapshot/$SECOND_SNAPSHOT

I think I’ll stop for now.

Blow away ZFS snapshots and watch the progress

For the last month I have had a testing system (FreeNAS) take ZFS snapshots of sample datasets every five minutes. As you can imagine, the snapshot count has risen quite dramatically. I am currently at over 12,000 snapshots.

In testing a backup script I’m working on I’ve discovered that replicating 12,000 snapshots takes a while. The initial data transfer completes in a reasonable time frame but copying each subsequent snapshot takes more time than the original data. Consequently, I decided to blow away all my snapshots. It took a while! I devised this fun little way to watch the progress.

Open two terminal windows. In terminal #1, enter the following:

bash
while [ true ]; do zfs list -H -t snapshot | wc -l; sleep 6; done

The above loads BASH and the runs a simple loop to count the total number of snapshots on the system. The sleep command is only there because it takes a few seconds to return the results when you have more than 10,000 snapshots.

Alternatively you could make the output a little prettier by entering the following:

while [ true ]; do REMAINING="`zfs list -H -t snapshot | wc -l`"; echo "Snapshots remaining: $REMAINING" ; sleep 6; done

In terminal #2, enter the following (taken from here):

bash
for snapshot in `zfs list -H -t snapshot | cut -f 1`
do
zfs destroy $snapshot
done

You can now hide terminal#2 and observe terminal #1. It will show you how many snapshots are left, refreshing the number every 6 seconds. Neat.

Get the latest ZFS snapshot name

In my experiments with FreeNAS and ZFS I came across a need to obtain the name of the latest snapshot of a given dataset. For some odd reason this information is not readily available (that I could find, anyway.) After much googling I finally constructed an answer to my own question, “How do I get the name of the latest ZFS snapshot?”

The answer is via the zfs list command, using the -t, -o, and -r options, and then piping the output to tail to grab the last result.

zfs list -t snapshot -o name -s creation -r storage/Documents | tail -1

Argument breakdown:

  • -t type of ZFS item you want information for
  • -o list of properties of the type above you want to return
  • -s sort by
  • -r specific volume
  • -1 (from tail): only return one line (the last one)

The example above returns the name of the latest snapshot taken from my Documents dataset, which is on my storage volume.

FreeNAS on Xenserver with PVHVM support

In my current home setup I have a single server performing many functions thanks to Citrix Xenserver 6.2 and PCI Passthrough. This single box is my firewall, webservers, and NAS. My primary motivation for this is power savings – I didn’t want to have more than one box up 24/7 but still wanted all those separate services, some of which are software appliances that aren’t very customizable.

My current NAS setup is a simple Debian Wheezy virtual machine with the on-board SATA controller from the motherboard passed through to it. The VM runs a six drive software RAID 6 using mdadm and LVM volume management on top of it. Lately, though, I have become concerned with data integrity and my use of commodity drives. It prompted me to investigate ZFS as a replacement for my current setup. ZFS has many features, but the one I’m most interested in is its ability to detect and correct any and all corrupted files / blocks. This will put my mind at ease when it comes to the thousands of files that I have which are accessed infrequently.

I decided to try out FreeNAS, a NAS appliance which utilizes ZFS. After searching on forums it quickly became clear that the people at FreeNAS are not too keen on virtualizing their software. There is very little help to be had there in getting it to work in virtual environments. In the case of Xenserver, FreeNAS does work out of the box but it is considerably slower than bare metal due to its lack of support of Xen HVM drivers.

Fortunately, a friendly FreeNAS user posted a link to his blog outlining how he compiled FreeNAS to work with Xen. Since Xenserver uses Xen (it’s in the name, after all) I was able to use his re-compiled ISO (I was too lazy to compile my own) to test in Xenserver.

There are some bugs to get around to get this to work, though. Wired dad’s xenified FreeNAS doesn’t appear to like to boot in Xenserver, at least out of the box. It begins to boot but then hangs indefinitely on the following error:

run_interrupt_drive_hooks: still waiting after 60 seconds for xenbusb_nop_confighook_cb

This is the result of a bug in the version of qemu Xenserver uses. The bug causes BSD kernels to really not like the DVD virtual device in the VM and refuse to boot. The solution is to remove the virtual DVD drive. How, then, do you install FreeNAS without a DVD drive?

It turns out that all the FreeNAS installer does is extract an image file to your target drive. That file is an .xz file inside the ISO. To get wired dad’s FreeNAS Xen image to work in Xenserver, one must extract that .xz file from the ISO, expand it to an .img file, and then apply that .img file to the Xenserver virtual machine’s hard disk. The following commands can be run on the Xenserver host machine to accomplish this.

  1. Create a virtual machine with a 2GB hard drive.
  2. Mount the FreeNAS-xen ISO in loopback mode to get at the necessary file
    mkdir temp
    mount -o loop FreeNAS-9.2.1.5-RELEASE-xen-x64.iso temp/
  3. Extract the IMG file from the freeNAS ISO
    xzcat ~/temp/FreeNAS-x64.img.xz | dd of=FreeNAS_x64.img bs=64k
    

    Note that the IMG file is 2GB in size, which is larger than can sit in the root drive of a default install Xenserver. Make sure you extract this file somewhere that has enough space.

  4. Import that IMG file into the virtual disk you created with your VM in step 1.
    cd ..
    xe vdi-import uuid=<UUID of the 2GB disk created in step 1> filename=FreeNAS_x64.img
    

    This results in an error:

    The server failed to handle your request, due to an internal error.  The given message may give details useful for debugging the problem.
    message: Caught exception: VDI_IO_ERROR: [ Device I/O errors ]
    

    This error can be safely ignored – it did indeed copy the necessary files.
    Note: To obtain the UUID of the 2GB disk you created in step 1, run the “xe vdi-list” command and look for the name of the disk.

  5. Remove the DVD drive from the virtual machine. From Xencenter:
    Shutdown the VM
    Mount xs-toos.iso
    Run this command in a command prompt:

    xe vm-cd-remove uuid=<UUID of VM> cd-name=xs-tools.iso
  6. Profit!

There is one aspect I haven’t gotten to work yet, and that is Xenserver Tools integration. The important bit – paravirtualized networking – has been achieved so once I get more time I will investigate xenserver tools further.

Recovering a failed RaidZ pool

Scenario: A drive is your RaidZ pool has gone bad. You have a replacement drive ready to go. You pull the drive you thought was the failed drive.. only to realize that you just pulled a good drive out, causing the array to go completely offline.

Has this happened to you? It has not happened to me yet, but I wanted to see how ZFS responded. I have to say I am pretty impressed.

I purposely pulled two working drives from my test zpool array. The status of the pool became Unavailable, as is to be expected. The zpool status command gave a helpful hint “Replace the drive and run zpool clear”

I replaced the last drive I had previously pulled and ran the command:

zpool clear storage

That was all I had to do! The array came back up (although in a degraded state) and all my files were there.

Output of zpool status at this point:

[root@freenas /data]# zpool status
  pool: storage
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0 in 0h26m with 0 errors on Sun Sep  7 09:51:19 2014
config:

        NAME                     STATE     READ WRITE CKSUM
        storage                  DEGRADED     0     0     0
          raidz1-0               DEGRADED     0     0     0
            ada2p1               ONLINE       0     0     0
            7167795297630497018  REMOVED      0     0     0  was /dev/ada3p1
            ada4p1               ONLINE       0     0     0
            ada1p1               ONLINE       0     0     0

errors: No known data errors

My next experiment was to bring the pool back to full health again. I tried to simply re-insert the last drive into my pool but it complained that it was already a part of the pool. The drive in question used to be labeled ada3p1. I tried “zpool detach storage ada3p1” but it complained: only applicable to mirror and replacing vdevs

After searching I found a mention here that said you can call out specific devices in your pool to clear. I ran the command
“zpool clear storage ada3p1” and it completed without any issues; however it still wouldn’t let me add the drive back into the pool saying it was already there.

What allowed me to bring the array back to full health was:

zpool online storage ada3p1

The amazing part – zfs realized that it only needed to sync a small amount of data to bring it back into sync with the pool!

 scan: resilvered 24K in 0h0m with 0 errors on Sun Sep  7 12:23:39 2014
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada2p1  ONLINE       0     0     0
            ada3p1  ONLINE       0     0     0
            ada4p1  ONLINE       0     0     0
            ada1p1  ONLINE       0     0     0

Compared to mdadm where it would rebuild the whole array even if it was the same disk you pulled, this is astounding.

I realized that this issue would only happen if you’re putting the same drive you just pulled back into the array, so I then tried pulling a drive and putting another in its place. After partitioning the drive, a simple

zpool replace storage 7167795297630497018 ada3p1

Did the trick (where the string of numbers is the placeholder for the drive you pulled – a zfs status will tell you what that number is.)  Done.