Tag Archives: ZFS

Using ProxMox as a NAS

Lately I’ve been very unhappy with latest FreeBSD causing reboots randomly during disk resilvering. I simply cannot tolerate random reboots of my fileserver. This fact combined with the migration of OpenZFS to the ZFS on Linux code base means it’s time for me to move from a FreeBSD based ZFS NAS to a Linux-based one.

Sadly there aren’t many options in this space yet. I wanted something where basic tasks were taken care of, like what FreeNAS does, but also supports ZFS. The solution I settled on was ProxMox, which is a hypervisor, but it also has ZFS support.

The biggest drawback of ProxMox vs FreeNAS is the GUI. There are some disk-related GUI options in ProxMox, but mostly it’s VM focused. Thus, I had to configure my required services via CLI.

Following are the settings I used when I configured my NAS to run ProxMox.

Repo setup

If you don’t want to pay for a proxmox license, change the PVE enterprise repository to the free version by modifying /etc/apt/sources.list.d/pve-enterprise.list to the following:

deb http://download.proxmox.com/debian/pve buster pve-no-subscription

Then run at apt update & apt upgrade.

Email alerts

Postfix configuration

Edit /etc/postfix/main.cf and tweak your mail server config as needed (relayhost). Restart postfix after editing:

systemctl restart postfix

Forward mail for root to your own email

Edit /etc/aliases and add an alias for root to forward to your desired e-mail address. Add this line:

root: YOUR_EMAIL_ADDRESS

Afterward run:

newaliases

ZFS configuration

Pool Import

Import the pool using the zpool import -f command (-f to force import despite having been active in a different system)

zpool import -f  

By default they’re imported into the main root directory (/). If you want to have them go to /mnt, use the zfs set mountpoint command:

zfs set mountpoint=/mnt/ 

Monitoring

Install and configure zfs-zed

apt install zfs-zed

Modify /etc/zfs/zed.d/zed.rc and uncomment ZED_EMAIL_ADDR, ZED_EMAIL_PROG, and ZED_EMAIL_OPTS. Edit them to suit your needs (default values work fine, they just need to be uncommented.) Optionally uncomment ZED_NOTIFY_VERBOSE and change to 1 if you want more verbose notices like what FreeNAS does (scrub notifications, for example.)

After modifying /etc/zfs/zed.d/zed.rc, restart zed:

systemctl restart zfs-zed

Scrubbing

By default ProxMox scrubs each of your datasets on the second Sunday of every month. This cron job is located in /etc/cron.d/zfsutils-linux. Modify to your liking.

Snapshot & Replication

There are many different snapshot & replication scripts out there. I landed on Sanoid. Thanks to SvennD for helping me grasp how to get it working.

Install sanoid :

#Install necessary packages
apt install debhelper libcapture-tiny-perl libconfig-inifiles-perl pv lzop mbuffer git
# Clone repo, build deb, install
git clone https://github.com/jimsalterjrs/sanoid.git cd sanoid
ln -s packages/debian . 
dpkg-buildpackage -uc -us 
apt install ../sanoid_*_all.deb 

Snapshots

Edit /etc/sanoid/sanoid.conf with a backup and retention schedule for each of your datasets. Example taken from sanoid documentation:

[data/home]
	use_template = production
[data/images]
	use_template = production
	recursive = yes
	process_children_only = yes
[data/images/win7]
	hourly = 4

#############################
# templates below this line #
#############################

[template_production]
        frequently = 0
        hourly = 36
        daily = 30
        monthly = 3
        yearly = 0
        autosnap = yes
        autoprune = yes

Once sanoid.conf is to your liking, create a cron job to launch sanoid every hour (sanoid determines whether any action is needed when executed.)

crontab -e
#Add this line, save and exit
0 * * * * /usr/sbin/sanoid --cron

Replication

syncoid (part of sanoid) easily replicates snapshots. The syntax is pretty straightforward:

syncoid <source> <destination> -r 
#-r means recursive and is optional

For remote locations specify a username@ before the ip/hostname, then a colon and the dataset name, for example:

syncoid root@10.0.0.1:sourceDataset localDataset -r

You can even have a remote source go to a different remote destination, which is pretty neat.

Other syncoid options of interest:

--debug  #for seeing everything happening, useful for logging
--exclude #Regular expression to exclude certain datasets
--src-bwlimit #Set an upload limit so you don't saturate your bandwidth
--quiet #don't output anything unless it's an error

Automate synchronization by placing the same syncoid command into a cronjob:

0 */4 * * * /usr/sbin/syncoid --exclude=bigdataset1 --source-bwlimit=1M --recursive pool/data root@192.168.1.100:pool/data
#if you don't want status emails when the cron job runs, add --quiet

NFS

Install the nfs-kernel-server package and specify your NFS exports in /etc/exports.

apt install nfs-kernel-server portmap

Example /etc/exports :

/mnt/example/DIR1 192.168.0.0/16(rw,sync,all_squash,anonuid=0,anongid=0)

Restart nfs-server after modifying your exports:

systemctl restart nfs-server

Samba

Install samba, configure /etc/samba/smb.conf, and add users.

apt install samba
systemctl enable smbd

/etc/samba/smb.conf syntax is fairly straightforward. See the samba documentation for more information. Example share configuration:

[exampleshare]
comment = Example share
path = /mnt/example
valid users = user1 user2
writable = yes

Add users to the system itself with the adduser command:

adduser user1

Add those same users to samba with the smbpasswd -a command. Example:

smbpasswd -a user1

Restart samba after making changes:

systemctl restart smbd

SMART monitoring

Taken from https://pve.proxmox.com/wiki/Disk_Health_Monitoring:

By default, smartmontools daemon smartd is active and enabled, and scans the disks under /dev/sdX and /dev/hdX every 30 minutes for errors and warnings, and sends an e-mail to root if it detects a problem. 

Edit the file /etc/smartd.conf to suit your needs. You can specify/exclude devices, smart attributes, etc there. See here for more information. Restart the smartd service after modifying.

UPS monitoring

apc-upsd was easiest for me to configure, so I went with it. Thanks to this blog for giving me the information to get started.

First, install apcupsd:

apt install apcupsd apcupsd-doc

As soon as it was installed my console kept getting spammed about IRQ issues. To stop these errors I stopped the apcupsd daemon:

 systemctl stop apcupsd

Now modify /etc/apcupsd/apcupssd.conf to suit your needs. The section I added for my CyberPower OR2200LCDRT2U was simply:

UPSTYPE usb
DEVICE

Then modify /etc/default/apcupsd to specify it’s configured:

#/etc/default/apcupsd
ISCONFIGURED=yes

After configuring, you can restart the apcupsd service

systemctl start apcupsd

To check the status of your UPS, you can run the apcaccess status command:

/sbin/apcaccess status

Log monitoring

Install Logwatch to monitor system events. Here is a good primer on all of Logwatch’s options.

apt install logwatch

Modify /usr/share/logwatch/default.conf/logwatch.conf to suit your needs. By default it runs daily (defined in /etc/cron.daily/00logwatch). I added the following lines for my config to filter out unwanted information:

Service = "-zz-disk_space"
Service = "-postfix"
Service = "vsmartd"
Service = "-zz-lm_sensors"

Manually run logwatch to get a preview of what you’ll see:

logwatch --range today --mailto 

Troubleshooting

ZFS-ZED not sending email

If ZED isn’t sending emails it’s likely due to an error in the config. For some reason default values still need to be uncommented for zed to work, even if left unaltered. Thanks to this post for the info.

Samba share access denied

If you get access denied when trying to write to a SMB share, double check the file permissions on the server level. Execute chmod / chown as appropriate. Example:

chown user1 -R /mnt/example/user1

zfs drive removal ‘part of active pool’ fix

Occasionally I will manually offline a disk in my ZFS pool for one reason or another. Annoyingly I will sometimes get this error when I try to online that same disk back into the pool:

cannot online /dev/sda: cannot relabel '/dev/sda': unable to read disk capacity

The fix, thankfully, is fairly simple. Simply run the following command (make double sure you’re doing it on the correct device!)

sudo wipefs -a <DEVICE>

After I ran that command ZFS automatically picked the disk back and resilvered it into the pool.

Thanks to this superuser.com discussion for the advice!

Recover files from ZFS snapshot of ProxMox VM

I recently needed to restore a specific file from one of my ProxMox VMs that had been deleted. I didn’t want to roll back the entire VM from a previous snapshot – I just wanted a single file from the snapshot. My snapshots are handled via ZFS using FreeNAS.

Since my VM was CentOS 7 it uses XFS, which made things a bit more difficult. I couldn’t find a way to crash-mount a read-only XFS snapshot – it simply resufed to mount, so I had to make everything read/write. Below is the process I used to recover my file:

On the FreeNAS server, find the snapshot you wish to clone:

sudo zfs list -t snapshot -o name -s creation -r DATASET_NAME

Next, clone the snapshot

sudo zfs clone SNAPSHOT_NAME CLONED_SNAPSHOT_NAME

Next, on a Linux box, use SSHFS to mount the snapshot:

mkdir Snapshot
sshfs -o allow_other user@freenas:/mnt/CLONED_SNAPSHOT_NAME Snapshot/

Now create a read/write loopback device following instructions found here:

sudo -i #easy lazy way to get past permissions issues
cd /path/to/Snapshot/folder/created/above
losetup -P -f VM_DISK_FILENAME.raw
losetup 
#Take note of output, it's likely set to /dev/loop0 unless you have other loopbacks

Note if your VM files are not in RAW format, extra steps will need to be taken in order to convert it to RAW format.

Now we have an SSH-mounted loopback device ready for mounting. Things are complicated if your VM uses LVM, which mine does (CentOS 7). Once the loopback device is set, lvscan should see the image’s logical volumes. Make the desired volume active

sudo lvscan
sudo lvchange -ay /dev/VG_NAME/LV_NAME

Now you can mount your volume:

mkdir Restore
mount /dev/VG_NAME/LV_NAME Restore/

Note: for XFS you must have read/write capability on the loopback device for this to work.

When you’re done, do your steps in reverse to unmount the snaspshot:

#Unmount snapshot
umount Restore
#Deactivate LVM
lvchange -an /dev/VG_NAME/LV_NAME
Remove loopback device
losetup -d /dev/loop0 #or whatever the loopback device was
#Unmount SSHfs mount to ZFS server
umount Snapshot

Finally, on the ZFS server, delete the snapshot:

sudo zfs destroy CLONED_SNAPSHOT_NAME

Troubleshooting

When I tried to mount the LVM partition at this point I got this error message:

mount: /dev/mapper/centos_plexlocal-root: can't read superblock

It ended up being because I was accidentally creating a read-only loopback device. I destroy the loopback device and re-created with write support and all was well.

FreeNAS ZFS tuning for SSDs

I wanted to optimize my all SSD storage array on my FreeNAS server but I had a hard time finding information in one place. After a lot of digging I pulled things from several places. This is what I came up with. It boiled down to two main settings

  • ashift
  • recordsize

Checking ashift on existing pools

zdb -U /data/zfs/zpool.cache | grep ashift

I read here a recommended setting of ashift=13, recordsize=8k for VM workloads on SSDs.

How to change recordsize:

This is easily done in the GUI or command line and can be changed on the fly.

zfs set recordize <value> <volume>

How to change ashift:

Backup your data and destroy the pool.

Modify the setting dictating minimum ashift setting as outlined here

sysctl vfs.zfs.min_auto_ashift=13

Re-create the pool.

Additional reading

http://open-zfs.org/wiki/Performance_tuning#Alignment_shift
https://www.reddit.com/r/zfs/comments/7pfutp/zfs_pool_planning_for_vm_storage/

ZFS delete oldest n snapshots

I came across a need to trim old ZFS snapshots. These are my quick and dirty notes on how I accomplished it.

Basic syntax taken from here:

 zfs list -H -t snapshot -o name -S creation -r <dataset name> | tail -10

You can omit the -r <dataset name> if you want to query snapshots over all your datasets. Change the tail number for the desired number of oldest snapshots.

You can pass this over to actually delete snapshots using the xargs command:

zfs list -H -t snapshot -o name -S creation -r <dataset name> | tail -10 | xargs -n 1 zfs  destroy

I came across an odd error message when trying to delete some old snapshots:

Can't delete snapshot: dataset busy

I discovered here that that means the snapshots have a hold on them. I read ZFS documentation to learn how to release the holds:

zfs release -r <tag name> <snapshot name>

After massaging these commands for a bit I was able to free up some needed space by removing ancient snapshots.

Change ZFS based NFS SR address in Xenserver

I recently acquired a shiny new set of SSDs to host my VMs. The problem is I needed to create a new ZFS array to accommodate them. I needed to figure out a way to migrate my VMs to the new array and then instruct Xenserver to use the new array instead of the old one.

Fortunately with a bit of research I learned this is fairly painless. Thanks to this discussion on citrix forums that got me pointed in the right direction. To change the server / IP address of an existing NFS storage repository in Xenserver you must do the following:

  • Shut down affected VMs
  • Shutdown any VMs using NFS SRs
  • Copy the NFS SRs (the directories containing the .vhd files) to the new NFS server
  • xe pbd-unplug uuid=<uuid of pbd pointing to the NFS SR>
  • xe pbd-destroy uuid=<uuid of pbd pointing to the NFS SR>
  • xe pbd-create host-uuid=<uuid of Xen Host> sr-uuid=<uuid of the NFS SR> device-config-server=<New NFS server name> device-config-serverpath=<NFS Share Name>
  • xe pbd-plug uuid=<uuid of the pbd created above>
  • Reboot the VMs using NFS SRs

In my case since my VMs were on an existing ZFS volume with snapshots I wanted to preserve, I used ZFS send and receive to transfer data from my old array to my SSD array. Bonus: I was able to do this while the VMs were still running to ensure minimal downtime. My ZFS copy procedure was as follows:

  • Create recursive snapshot of my VM dataset
    zfs snapshot -r storage/VMs@migrate
  • Start the initial data transfer (this took quite some time to finish)
    zfs send -R storage/VMs@migrate | zfs recv ssd/VMs
  • Do another incremental snapshot and transfer after initial huge transfer is complete (this took much less time to do)
    zfs snapshot -r storage/VMs@migrate2
    zfs send -R -i storage/VMs@migrate storage/VMs@migrate2 | zfs recv ssd/VMs
  • Shutdown all affected VMs and do one more ZFS snapshot & transfer to ensure consistent data:
    zfs snapshot -r storage/VMs@migrate3
    zfs send -R -i storage/VMs@migrate2 storage/VMs@migrate3 | zfs recv ssd/VMs

In the above examples my source dataset was storage/VMs and the destination dataset was ssd/VMs.

Once the data was all transferred to the new location it was time to tell Xenserver about it. I had enough VMs that it was worth my time to write a little script to do it. It’s quick and dirty but it did the job. Behold:

#!/bin/bash
#Author: Nicholas Jeppson
#A simple script to change a xenserver NFS storage repository address to a new location
#Modify NFS_SERVER, NFS_PATH and/or NFS_VERSION to match your environment. 
#Run this script on each xenserver host in your pool. Empty output means the transfer was successful.
#This script takes one argument - the name of the SR to be transferred.

SR_NAME="$1"

NFS_SERVER=10.0.0.1
NFS_PATH=/mnt/ssd/VMs/$SR_NAME
NFS_VERSION=4

#Use sed and awk to grab necessary UUIDs
HOST_UUID=$(xe host-list|egrep -B3 `hostname`$ | grep uuid | awk '{print $5}')
PBD_UUID=$(xe pbd-list|grep -A4 -B4 $SR_NAME | grep -B2 $HOST_UUID |grep -w '^uuid ( RO)' | awk '{print $5}')
SR_UUID=$(xe pbd-list|grep -A4 -B4 $SR_NAME | grep -A2 $HOST_UUID | grep 'sr-uuid' | awk '{print $4}')

#Unplug & destroy old NFS location, create new NFS location
xe pbd-unplug uuid=$PBD_UUID
xe pbd-destroy uuid=$PBD_UUID
NEW_PBD_UUID=$(xe pbd-create host-uuid=$HOST_UUID sr-uuid=$SR_UUID device-config-server=$NFS_SERVER device-config-serverpath=$NFS_PATH device-config-nfsversion=$NFS_VERSION)
xe pbd-plug uuid=$NEW_PBD_UUID

Download the script here (right click / save as)

You can run this script in a simple for loop with something like this:

for SR in <list of SR names separated by a space>; do bash <name of script saved from above> $SR; done

If you named the above script nfs-migrate.sh, and you had three SRs to change (blog1, blog2, blog3) then it would be:

for SR in blog1 blog2 blog3; do bash nfs-migrate.sh $SR; done

After I migrated the data and ran that script, my VMs booted up using the new SSD array. Success.

Create a RaidZ array with missing drive in FreeNAS

I came across a need to create a ZFS Raid-Z array with a missing drive. This is easy to do with mdadm but not as easy with ZFS. It is possible, though. The trick is to create an image file with dd, then map that image file as a loopback device. Once that’s done you can treat it as if it were a regular hard drive and add it to the array. Once added to the array you can take the loopback device offline and remove it from the array, then add an actual HDD later.

Create loopback device

Thanks to this site for the information.

First, use dd to create an image file. Change the seek parameter to whatever size disk you wish to emulate.

dd if=/dev/zero of=temp.img bs=1 count=1 seek=1024G

Next, initialize the loopback driver and create the loopback device (md0 in my case)

sudo losetup -a
sudo mdconfig -a -t vnode -f temp.img -u 0

List your loopback devices with the following command to verify your new loopback device:

sudo mdconfig -l

Create array using loopback device

You can now partition and add your loopback device as if it were a regular hard drive. Change volume name, array name, and device names as necessary for your environment.

sudo gpart add -t freebsd-zfs -l <volume name> md0
sudo zpool create <array name> raidz ada7p1 ada8p1 ada9p1 md0p1

Fail & Remove loopback device

Now that our new array is up and running properly we can fail out the loopback device. Make sure to modify the command to use your array name and loopback device/partition number.

sudo zpool offline <array_name> md0p1

Import new array into FreeNAS GUI

To get our new array in freenas we must export the array from the command line, then import it from the GUI.

sudo zpool export <array name>

Once the array has been exported, navigate to the FreeNAS GUI and go to Storage / Volumes / Import Volume.

You should now have your new array minus one drive ready to go in FreeNAS. You can now add a physical HDD when it becomes available (in my case, when it returns from RMA.)

FreeNAS unable to create jails fix

I recently got a shiny new FreeNAS Mini appliance. It’s the bee’s knees. Previously I was using a virtualized instance of FreeNAS that has served me admirably for two years now. During the migration I decided to start fresh with the jails configuration I had and deleted the entire jails dataset. This turned out to be a mistake. I suddenly found out that I couldn’t create any jails or plugins. The plugin download would hang for a long time and flash a brief message “Failed to download plugin.” Not helpful.

I tried changing the location of my jails in configuration to no avail. I even tried nuking my FreeNAS config entirely and starting from scratch. The error still happened! Somehow that configuration survived a factory restore.

I finally found this freenas forum entry that pointed me in the right direction. It suggested I use the warden command to delete the plugin jail template completely and re-install it. When I tried to I got this error:

 

[nicholas@freenas ~]$ sudo warden template delete pluginjail
ERROR: Not a ZFS volume: /mnt/storage/jails/.warden-template-pluginjail

It was still trying to install the plugin template in my non-existent dataset. I decided to try re-creating the missing dataset and then running the warden delete command again. Success!

[nicholas@freenas ~]$ sudo zfs create storage/jails/.warden-template-pluginjail
[nicholas@freenas ~]$ sudo warden template delete pluginjail

Once you delete the template jail via warden, you can re-create it in the right place after configuring the correct path in Jails / Configuration. Once you have the right place configured, issue the following:

warden template create -nick pluginjail -tar http://download.freenas.org/jails/9.3/x64/freenas-pluginjail-9.3-RELEASE.tgz

Plugins and jails work again! Success.

Improve FreeNAS NFS performance in Xenserver

My home lab consists of a virtualized instance of freenas, Citrix Xenserver, and various VMs. Recently I wanted to migrate some of my VMs to an NFS export from FreeNAS. To my dismay, the speed was abysmal (3 MB/second write speeds.) This tutorial will walk you through how to improve FreeNAS NFS performance in Xenserver by adding an log device (ZIL) to your ZFS pool.

After much research I realized the problem lies with ZFS behind the NFS export. Xenserver mounts the NFS share in such a way that it constantly wants to synchronize writes, which slows things down.

The solution: add a ZIL device. Since my freeNAS is virtualized, I chose the route of adding a virtual disk that is attached to an SSD. This process wasn’t straightforward.  If you have a virtual FreeNAS this is how to improve NFS performance:

  1. Add a disk in xenserver. Rule of thumb for size is half the amount of system RAM. I added 16GB ZIL disk to be safe.
  2. Add the following tunables in FreeNAS (to allow the OS to properly see xen hard drives)
    1. hint.ada.0.at, scbus100 (for the FreeNAS OS disk)
    2. hint.ada.1.at, scbus100 (for the newly added ZIL disk)
  3. Reboot FreeNAS
  4. In the FreeNAS GUI, click the ZFS Volume Manager, select your volume to expand from the dropdown, and select the device to be a LOG volume (ZIL)

That’s it! Once I added an SSD based ZIL device for my ZFS pool, NFS writes went from 3 MB/s to 60 MB/s. Awesome.

Verify backup integrity with rsync, sed, cat, and tee

Recently it became apparent that a large data transfer I did might have had some errors. I wanted to find an easy way to compare the source and destination to make sure that they were identical. My solution: rsync, sed, cat and tee

I have used rsync quite a bit but did not know about the –checksum flag until recently. When you run rsync with –checksum, it takes much longer, but it effectively does something similar to what a ZFS scrub does – it runs a checksum of every source file and compares it with the checksum of each destination file.  If there is a mismatch, rsync will overwrite the destination file with the source file to correct it.

In my situation I performed a large data migration from my old mdadam-based RAID array to my brand new ZFS array. During the transfer the disks were acting very strange, and at one point one of the disks even popped out of the array. The culprit turned out to be a faulty SATA controller. I bought a cheap 4 port SATA controller from Amazon for my new ZFS array. Do not do this! Spring the cash out for a better controller. The cheap ones, this one at least, only caused headache. I removed it and used the on-board SATA ports on my motherboard and the issues went  away.

All of those shennanigans made me wonder if there was corrupt data on my new ZFS array. A ZFS scrub repaired 15.5G of data! While I’m sure that fixed a lot of the issues, I realized there probably was still some corruption. This is how I verified it

rsync -Pahn --checksum /path/to/source /path/to/destination | tee migration.txt

-P shows progress, -a means archive, -h is for human readable measurements, and -n means dry run (don’t actually copy anything)

Tee is a cool utility that allows you to redirect output of a command both to a file and to standard output. This is useful if you want to see the verification take place in real time but also want to analyze it later.

After the comparison (which took a while!) I wanted to see the discrepancies. the -P flag lists each directory rsync checks as well as which files it detected. You can use sed in conjunction with cat to weed out the unwanted lines (directory listings) so that only the files with discrepancies are left.

 cat pictures.txt | sed '/\/$/d' | tee pictures-truncated.txt

The sed regex simply looks for any line ending in a / (directory listing) and removes that line. What is left is the files in question. You can combine the entire thing into one line like so

rsync -Pahn --checksum /path/to/source /path/to/destination | sed '/\/$/d' | tee migration.txt

In my case I wanted to compare discrepencies with rsync and make decisions on if I wanted to actually fix the issues. If you are 100% sure the source is OK to remove the destination completely, you can simply run

rsync -Pah --checksum --delete /path/to/source /path/to/destination