Category Archives: Virtualization

Posts about hypervisors and virtualization

Windows VM with GTX 1070 GPU passthrough in ProxMox 5

I started this blog four years ago to document my highly technical adventures – mainly so I could reproduce them later. One of my first articles dealt with GPU passthrough / virtualization. It was a complicated ordeal with Xen. Now that I’ve switched to KVM (ProxMox) I thought I’d give it another go. It’s still complicated but not nearly as much this time.

To get my Nvidia GTX 1070 GPU properly passed through to a Windows VM hosted by ProxMox 5 I simply followed this excellent guide written by sshaikh. I will summarize what I took from his guide to get my setup to work.

  1. Ensure VT-d is supported and enabled in the BIOS
  2. Enable IOMMU on the host
    1. append the following to the GRUB_CMDLINE_LINUX_DEFAULT line in /etc/default/grub
      intel_iommu=on
    2. Save your changes by running
      update-grub
  3. Blacklist NVIDIA & Nouveau kernel modules so they don’t get loaded at boot
    1. echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
      echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
    2. Save your changes by running
      update-initramfs -u
  4. Add the following lines to /etc/modules
    vfio
    vfio_iommu_type1
    vfio_pci
    vfio_virqfd
  5. Determine the PCI address of your GPU
    1. Run
      lspci -v

      and look for your card. Mine was 01:00.0 & 01:00.1. You can omit the part after the decimal to include them both in one go – so in that case it would be 01:00

    2. Run lspci -n -s <PCI address> to obtain vendor IDs. Example :
      lspci -n -s 01:00
      01:00.0 0300: 10de:1b81 (rev a1)
      01:00.1 0403: 10de:10f0 (rev a1)
  6. Assign your GPU to vfio driver using the IDs obtained above. Example:
    echo "options vfio-pci ids=10de:1b81,10de:10f0" > /etc/modprobe.d/vfio.conf
  7. Reboot the host
  8. Create your Windows VM using the UEFI bios hardware option (not the deafoult seabios) but do not start it yet. Modify /etc/pve/qemu-server/<vmid>.conf and ensure the following are in the file. Create / modify existing entries as necessary.
    bios: ovmf
    machine: q35
    cpu: host,hidden=1
    numa: 1
  9. Install Windows, including VirtIO drivers. Be sure to enable Remote desktop.
  10. Pass through the GPU.
    1. Modify /etc/pve/qemu-server/<vmid>.conf and add
      hostpci0: <device address>,x-vga=on,pcie=1. Example

      hostpci0: 01:00,x-vga=on,pcie=1
  11. Profit.

Troubleshooting

Code 43

I received the dreaded code 43 error after installing CUDA drivers. The workaround was to add hidden=1 to the CPU option of the VM:

cpu: host,hidden=1

Blue screening when launching certain games

Heroes of the Storm and Starcraft II would consistently blue screen on me with the following error:

kmode_exception_not_handled

The fix as outlined here was to create /etc/modprobe.d/kvm.conf and add the parameter “options kvm ignore_msrs=1”

echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf

GPU optimization:

Give as many CPUs as the host (in my case 8) and then enable NUMA for the CPU. This appeared to make my GTX 1070 perform better in the VM – near native performance.

Proxmox VM migration failed found stale volume copy

Recently I had a few VMs on shared storage I couldn’t live migrate. The cryptic error messages made it sound like local LVM was required, even though in the GUI all I could see was shared storage for the VM. The errors I kept getting were like this one:

volume pve/vm-103-disk-1 already exists
command 'dd 'if=/dev/pve/vm-103-disk-1' 'bs=64k'' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
ERROR: Failed to sync data - command 'set -o ...' failed: exit code 255
aborting phase 1 - cleanup resources
ERROR: found stale volume copy 'local-lvm:vm-103-disk-1' on node 'nick-desktop'
ERROR: migration aborted (duration 00:00:01): Failed to sync data - command 'set -o pipefail ...' failed: exit code 255
TASK ERROR: migration aborted

After a ton of digging I found this forum post that had the solution:

Most likely there is some stale disk somewhere. Try to run:
# qm rescan –vmid 101

That indeed was the problem. I ran

qm rescan –vmid 103

on the node in question, then refreshed the management page. After doing that, a ‘phantom’ disk entry showed up for the VM. I deleted it, but then had to run another qm –rescan –vmid103 before it would migrate.

So to recap, run qm rescan –vmid (vmid#) once, then delete the stale disk that shows up, then run that same command again.

Migrate from Xenserver to Proxmox

I was dismayed to see Citrix’s recent announcement about Xenserver 7.3 removing several key features from the free version. Xenserver’s free features are the reason I switched over to them in the first place back in 2014. Xenserver has been rock solid; I haven’t had any complaints until now. Their removal of xenmotion and migration in the free version forced me to look elsewhere for my virtualization needs.

I’ve settled on ProxMox, which is KVM based. Their documentation is excellent and it has all the features I need – for free. I’m also in love with their web based management – no more Windows fat client!

Below are my notes on how I successfully migrated all my Xenserver VMs over to the ProxMox Virtual Environment (PVE).

  • Any changes to network interfaces, such as bringing them up, require a reboot of the host
  • If you have an existing ISO share, you can create a directory called  “template” in your ISO repository folder, then inside symlink “iso” back to your ISO folder. Proxmox looks inside template/iso for ISO images for whatever storage you configure.
  • Do not create your ProxMox host with ZFS unless you have tons of RAM. If you don’t have enough RAM you will run into huge CPU load times making the system unresponsive in cases of high disk load, such as VM copies / backups. More reading here.

Cluster of two:

ProxMox’s clustering is a bit different – better, in my opinion. No more master, slave dynamic – ever node is a master. Important reading: https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster

If you have two node cluster, like I do, it creates some problems, though. If one goes down, the other can’t do anything to the pool (create VM, backup) until it comes back up. In my situation I have one primary host that is up all the time and I bring the secondary host up only when I want to do maintenance on the first.

In that specific situation you can still designate a “master” of sorts by increasing the number of quorum votes it gets from 1 to 2.  That way when the secondary node is down, the primary node can still do cluster operations because the default number of votes to stay quorate is 2. See here for more reading on the subject.

On either host (they must both be up and in the cluster for this to work)

vi /etc/pve/corosync.conf

Find your primary server in the nodelist settings and change

quorum_votes: 2

Also find the quorum section and add expected_votes: 2

Make sure to increment config_version number (bottom of the file.) Now if your secondary is down you can still operate the primary.

Migrating VMs

I migrated my Xen VMs to KVM by creating VMs with identical specs in PVE, copying the VHD files from the Xen host to the new PVE host, running qemu-img to convert them to RAW format, and then using dd to copy the raw information over to corresponding empty VM  disks. Depending on the OS of the VM there was some after-copy tweaking I also had to do.

From shared storage

Grab the VHD file (quiesce any snapshots away first) of each xen VM and convert them to raw format

qemu-img convert <VHD_FILE_NAME>.vhd -O raw <RAW_FILE_NAME>.raw

Create a new VM with identical configuration, especially disk size. Go to the hardware tab and take note of the name of the disk. For example, one of mine was:

local-zfs:vm-100-disk-1,discard=on,size=40G

The interesting part is between local-zfs and discard=on, namely vm-100-disk-1. This is the name of the disk we want to overwrite with data from our Xenserver VM’s disk.

Next figure out the full path of this disk on your proxmox host

find / -name vm-100-disk-1*

The result in my case was /dev/zvol/rpool/data/vm-100-disk-1

Take the name and put it in the following command to complete the process:

dd if=<RAW_FILE_NAME>.raw of=/dev/zvol/rpool/data/vm-100-disk-1 bs=16M

Once that’s done you can delete your .vhd and .raw files.

From local / LVM storage

In case your Xen VMs are stored in LVM device format instead of a VHD file, get UUID of storage by doing xe vdi-list and finding the name of the hard disk from the VM you want. It’s helpful to rename the hard disks to something easy to spot. I chose the word migrate.

xe vdi-list|grep -B3 migrate
uuid ( RO) : a466ae1b-80c7-4ef2-91a3-5c1ba1f6fc2f
 name-label ( RW):  migrate

Once you have the UUID of the drive, you can use lvscan to find the full LVM device path of that disk:

lvscan|grep a466ae1b-80c7-4ef2-91a3-5c1ba1f6fc2f
 inactive '/dev/VG_XenStorage-1ada0a08-7e6d-a5b6-d0b4-515e251c0c75/VHD-a466ae1b-80c7-4ef2-91a3-5c1ba1f6fc2f' [10.03 GiB] inherit

Shut down the corresponding VM and reactivate its logical volume (xen deactivates LVMs if the VM is shut off:

lvchange -ay <full /dev/VG_XenStorage path discovered above>

Now that we have the full LVM path and the volume is active, we can use dd over SSH to transfer the image to our proxmox server:

sudo dd if=<full /dev/VG/Xenstorage path discovered above> | ssh <IP_OF_PROXMOX_SERVER> dd of=<LOCATION_ON_PROXMOX_THAT_HAS_ENOUGH_SPACE>/<NAME_OF_VDI_FILE>.vhd

then follow vhd -> raw -> dd to proxmox drive process described in the From Shared Storage section.

Post-Migration tweaks

For the most part Debian-based systems moved over perfectly without any needed tweaks; Some VMs changed interface names due to network device changes. eth0 turned into ens8. I had to modify /etc/network/interfaces to change eth0 to ens8 to get virtio networking working.

CentOS

All my CentOS VMs failed to boot after migration due to a lack of virtio disk drivers in the initial RAM disk. The fix is to change the disk hardware to IDE mode (they boot fine this way) and then modify the initrd of each affected host:

sudo dracut --add-drivers "virtio_pci virtio_blk virtio_scsi virtio_net virtio_ring virtio" -f -v /boot/initramfs-`uname -r`.img `uname -r`
sudo sh -c "echo 'add_drivers+=\" virtio_pci virtio_blk virtio_scsi virtio_net virtio_ring virtio \"' >> /etc/dracut.conf"
sudo shutdown -h now

Once that’s done you can detach the hard disk and re-attach it back as SCSI (virtio) mode. Don’t forget to modify the options and change the boot order from ide0 to scsi0

Arch Linux

One of my Arch VMs had UUID configured which complicated things. The root device UUID changes in KVM virtio vs IDE mode. The easiest way to fix it is to boot this VM into an Arch install CD. Mount the root partition and then run arch-chroot /mnt/sda1. Once in the chroot runpacman -Sy kernel to reinstall the kernel and generate appropriate kernel modules.

mount /dev/sda1 /mnt
arch-chroot /mnt
pacman -Sy kernel

Also make sure to modify /etc/fstab to reflect appropriate device id or UUID (xen used /dev/xvda1, kvm /dev/sda1)

Windows

Create your Windows VM using non-virtio drivers (default settings in PVE.) Obtain the latest windows virtio drivers here and extract them somewhere memorable. Switch everything but the disk over to Virtio in the VM’s hardware config and reboot the VM. Go into device manager and point to extracted driver location for each unknown device.

To get Virtio disk to work, add a new disk to the VM of any size and SCSI (virtio) type. Boot the Windows VM and install drivers for that drive. Then shut down, remove that second drive, detach the primary drive and change to virtio SCSI. It should then come up with full virtio drivers.

All hosts

KVM has a guest agent like xenserver does called qemu-agent. Turn it on in VM options and install qemu-guest-agent in your guest. This KVM a bit more insight into your host.

Determine which VMs need guest agent installed:

qm agent $id ping

If nothing is returned, it means qemu-agent is working. You can test all your VMs at once with this one-liner (change your starting and finishing VM IDs as appropriate)

for id in {100..114}; do echo $id; qm agent $id ping; done

This little one-liner will output the VM ID it’s trying to ping and will return any errors it finds. No errors means everything is working.

Disable support nag

PVE has a support model and will nag you at each login. If you don’t like this you can change it like so (the line number might be different depending on which version you’re running:

vi +850 /usr/share/pve-manager/js/pvemanagerlib.js

Modify the line if (data.status !== ‘Active’); change it to

if (false)

Troubleshooting

Remove a failed node

See here: https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster#Remove_a_cluster_node

systemctl stop pvestatd.service
systemctl stop pvedaemon.service
systemctl stop pve-cluster.service
rm -r /etc/corosync/*
rm -r /var/lib/pve-cluster/*
reboot

Quorum never establishes / takes forever

I had a really strange issue where I was able to establish quorum with a second node, but after a reboot quorum never happened again. I re-installed that second node and re-joined it several times but I never got past the “waiting for quorum….” stage.

After much research I came across this article which explained what was happening. Corosync uses multicast to establish cluster quorum. Many switches (including mine) have a feature called IGMP snooping, which, without an IGMP querier, essentially means multicast never happens. Sure enough, after logging into my switches and disabling IGMP snooping, quorum was instantly established. The article above says this is not recommended, but in my small home lab it hasn’t produced any ill effects. Your mileage may vary. You can also configure your cluster to use unicast instead.

USB Passthrough not working properly

With Xenserver I was able to pass through the USB controller of my host to the guest (a JMICRON USB to ATAATAPI bridge holding a 4 disk bay.) I ran into issues with PVE, though. Using the GUI to pass the USB device did not work. Manually adding PCI passthrough directives (hostpci0: 00:14.1) didn’t work. I finally found on a little nugget on the PCI Passthrough page about how you can simply pass the entire device and not the function like I had in Xenserver. So instead of doing hostpci0: 00:14.1, I simply did hostpci0: 00:14 . That  helped a little bit, but I was still unable to fully use these drives simultaneously.

My solution was eventually to abandon PCI passthrough altogether in favor of just passing individual disks to the guest as outlined here.

Find the ID of the desired disks by issuing ls -l /dev/disk/by-id. You only need to know the UUIDs of the disks, not the partitions. Then modify the KVM config of your desired host (mine was located at /etc/pve/qemu-server/101.conf) and a new line for each disk, adjusting scsi device numbers and UUIDs to match:

scsi5: /dev/disk/by-id/scsi-SATA_ST5000VN000-1H4_Z111111

With that direct disk access everything is working splendidly in my FreeNAS VM.

Change ZFS based NFS SR address in Xenserver

I recently acquired a shiny new set of SSDs to host my VMs. The problem is I needed to create a new ZFS array to accommodate them. I needed to figure out a way to migrate my VMs to the new array and then instruct Xenserver to use the new array instead of the old one.

Fortunately with a bit of research I learned this is fairly painless. Thanks to this discussion on citrix forums that got me pointed in the right direction. To change the server / IP address of an existing NFS storage repository in Xenserver you must do the following:

  • Shut down affected VMs
  • Shutdown any VMs using NFS SRs
  • Copy the NFS SRs (the directories containing the .vhd files) to the new NFS server
  • xe pbd-unplug uuid=<uuid of pbd pointing to the NFS SR>
  • xe pbd-destroy uuid=<uuid of pbd pointing to the NFS SR>
  • xe pbd-create host-uuid=<uuid of Xen Host> sr-uuid=<uuid of the NFS SR> device-config-server=<New NFS server name> device-config-serverpath=<NFS Share Name>
  • xe pbd-plug uuid=<uuid of the pbd created above>
  • Reboot the VMs using NFS SRs

In my case since my VMs were on an existing ZFS volume with snapshots I wanted to preserve, I used ZFS send and receive to transfer data from my old array to my SSD array. Bonus: I was able to do this while the VMs were still running to ensure minimal downtime. My ZFS copy procedure was as follows:

  • Create recursive snapshot of my VM dataset
    zfs snapshot -r storage/VMs@migrate
  • Start the initial data transfer (this took quite some time to finish)
    zfs send -R storage/VMs@migrate | zfs recv ssd/VMs
  • Do another incremental snapshot and transfer after initial huge transfer is complete (this took much less time to do)
    zfs snapshot -r storage/VMs@migrate2
    zfs send -R -i storage/VMs@migrate storage/VMs@migrate2 | zfs recv ssd/VMs
  • Shutdown all affected VMs and do one more ZFS snapshot & transfer to ensure consistent data:
    zfs snapshot -r storage/VMs@migrate3
    zfs send -R -i storage/VMs@migrate2 storage/VMs@migrate3 | zfs recv ssd/VMs

In the above examples my source dataset was storage/VMs and the destination dataset was ssd/VMs.

Once the data was all transferred to the new location it was time to tell Xenserver about it. I had enough VMs that it was worth my time to write a little script to do it. It’s quick and dirty but it did the job. Behold:

#!/bin/bash
#Author: Nicholas Jeppson
#A simple script to change a xenserver NFS storage repository address to a new location
#Modify NFS_SERVER, NFS_PATH and/or NFS_VERSION to match your environment. 
#Run this script on each xenserver host in your pool. Empty output means the transfer was successful.
#This script takes one argument - the name of the SR to be transferred.

SR_NAME="$1"

NFS_SERVER=10.0.0.1
NFS_PATH=/mnt/ssd/VMs/$SR_NAME
NFS_VERSION=4

#Use sed and awk to grab necessary UUIDs
HOST_UUID=$(xe host-list|egrep -B3 `hostname`$ | grep uuid | awk '{print $5}')
PBD_UUID=$(xe pbd-list|grep -A4 -B4 $SR_NAME | grep -B2 $HOST_UUID |grep -w '^uuid ( RO)' | awk '{print $5}')
SR_UUID=$(xe pbd-list|grep -A4 -B4 $SR_NAME | grep -A2 $HOST_UUID | grep 'sr-uuid' | awk '{print $4}')

#Unplug & destroy old NFS location, create new NFS location
xe pbd-unplug uuid=$PBD_UUID
xe pbd-destroy uuid=$PBD_UUID
NEW_PBD_UUID=$(xe pbd-create host-uuid=$HOST_UUID sr-uuid=$SR_UUID device-config-server=$NFS_SERVER device-config-serverpath=$NFS_PATH device-config-nfsversion=$NFS_VERSION)
xe pbd-plug uuid=$NEW_PBD_UUID

Download the script here (right click / save as)

You can run this script in a simple for loop with something like this:

for SR in <list of SR names separated by a space>; do bash <name of script saved from above> $SR; done

If you named the above script nfs-migrate.sh, and you had three SRs to change (blog1, blog2, blog3) then it would be:

for SR in blog1 blog2 blog3; do bash nfs-migrate.sh $SR; done

After I migrated the data and ran that script, my VMs booted up using the new SSD array. Success.

XAPI won’t start in Xenserver 7

I came home yesterday to discover that every last one of my VMs were unresponsive. It was most distressing. I couldn’t even SSH into my xenserver – it was unresponsive too. Its physical console had dropped into an emergency shell. A reboot allowed me to get a physical console again, but my networking and VMs would not start.

In trying to pick up the pieces and put everything back together I ran

systemctl --failed

which revealed several key services not running – namely openvswitch and xapi (very important services.) Manually starting them did nothing – they would silently fail and immediately quit working.

After banging my head against a wall for a bit (I really didn’t want to restore from backup) I stumbled across this post. It states in essence that xapi won’t start if the disk is full. I checked disk usage and it said I had a few gigs free, but thought I’d try the steps in the post anyway.

ls /var/log

revealed quite a lot of log files. I then decided to just delete all the .gz archived logs:

rm /var/log/*.gz

After doing this, xapi started. I restarted the hypervisor for good measure and everything came up – all back to normal as if nothing had happened.

It’s incredibly frustrating that Xenserver is designed to be a ticking time bomb with default configuration. If you don’t take care to manually delete old logs, or alternatively send logs to a remote log server, it will crash and burn. This is stupid. That being said, I was impressed that it recovered so gracefully once I freed up some disk space.

If you’re running xenserver, make sure you’re logging somewhere else – or put a cron job to delete old log files!

 

Transfer VMs between Xenserver pools

When Xenserver 7 came out I found myself unable to easily upgrade to it thanks to my custom RAID 1 build.  If I wanted Xenserver 7 I would have to blow the whole instance away and start from scratch. This posed a problem because I have a pool of 2 xenserver hosts. You cannot add a server with a higher xenserver version to a lower versioned pool; the pool master must always have the highest version of Xenserver installed. My decision to have an mdadm RAID 1 setup on my pool master ultimately turned into forced VM downtime for an upgrade despite having a pool of other xenserver hosts.

After transferring VMs to my secondary host and promoting it to pool master, I wiped my primary xenserver and installed 7. When it was up and running I essentially had two separate pools running. To transfer my VMs back to my primary server I had to resort to the command line.

Offline VM transfer

The xe vm-export and vm-import commands work with stdin/out and piping. This is how I accomplished transferring my VMs directly between two pools. Simply pipe xe vm-export commands with an ssh xe vm-import command like so:

xe vm-export uuid=<VM_UUID> filename= | ssh <other_server> xe vm-import filename=/dev/stdin

Note the lack of a filename – this instructs xenserver to pipe to standard output instead. Also note that transferring the VM scrambles the MAC addresses of its interfaces. If you want to keep the MAC address you’ll have to manually re-assign it after the copy is complete.

Minimal downtime

For the method above you will have to turn the VM off in order to transfer it.  I had some VMs that I didn’t want to stay down for the entire transfer. A way around this is to take a snapshot of the VM and then copy the snapshot to the other pool. Note that this method does not retain any changes made inside the VM that occurred after you took the snapshot. You will have to manually transfer any file changes that took place during the VM transfer (or be fine with losing them.)

In order to export a snapshot you must first convert it to a VM from a template (thanks to this site for outlining how.) The full procedure is as follows:

  1. Take a snapshot of the VM you want to move
    xe vm-snapshot uuid=<VM_UUID> new-name-label=<snapshotname>
  2. Convert the snashot template to a VM (the command xe snapshot-list is a handy way to obtain UUIDs of your snapshots)
    xe template-param-set is-a-template=false ha-always-run=false uuid=<UUID of snapshot>
  3. Transfer the template to the new pool
    xe vm-export uuid=<UUID of snapshot> filename= | ssh <other_server> xe vm-import filename=/dev/stdin
  4. Rename VM and/or modify interface MAC addresses as needed on the new host. Stop the VM on the old host and start it on the new one.

I used both methods above to successfully move my VMs from my older 6.5 pool to the newer 7 pool. Success.

PCI Passthrough in Xenserver 7 “Dundee”

I’ve recently upgraded to the latest version of Citrix Xenserver 7 (codenamed “Dundee”.) 7 is based on CentOS 7 and has a massive amount of changes under the hood. One such change was how they handle PCI Passthrough.

It took some time to figure PCI Passthrough out. 7 uses grub instead of extlinux for the bootloader. It appears to be grub2 but they don’t use the standard update-grub tool, rather you simply edit the config file and do nothing else.

After much searching I found this post which led me in the right direction. In Xenserver 7, for pci passthrough support you must do the following:

  • Prepare the VM for PCI passthrough (this part hasn’t changed)
    xe vm-param-set other-config:pci=0/0000:B:D.f uuid=<vm uuid>
  • Modify /boot/grub/grub.cfg and append the following to the end of the module2 line (if you boot from EFI the file to modify is /boot/efi/EFI/xenserver/grub.cfg)
     xen-pciback.hide=(B:D.f)
    
  • Reboot

You will now be able to pass through hardware to your virtual machines in Xenserver 7. Hooray.

Install multiple xenserver patches at once

I came across a need to install multiple patches manually (via SSH) on one of my xenservers. It’s quite tedious to do this manually so I found a way to here.)

Download all the patch .zip files to a directory your xenserver can access. Then, extract them all with this command:

find *.zip -exec unzip {} \;

Next, upload all the .xsupdates:

find *.xsupdate -exec xe patch-upload file-name={} \;

This spits out a bunch of UUIDs. Make note of these. You will also need to get your host-uuid by using the

xe host-list

command.

Lastly, a quick for loop applies the patches we want (replace the UUIDs with those of the patches uploaded earlier and the host-uuid with yours)

for file in c3520494-be00-4133-afb3-adf8ab5edb11 7fea2d85-7ce1-428c-a92f-57e37551d6f1 d9862b7f-9be6-4672-b9a8-4f52f776fd03 a424dfe5-8be8-4bd6-a49e-62620e369a43 e28bb0ae-e43f-46d9-9147-c7dc712508eb; do xe patch-apply uuid=$file host-uuid=46f8ef28-8ee1-44b5-967c-b8e48585094b; done

That did the trick for me. After applying the patches I came across this post which appears to have a much better script. Whatever works.

Xenserver NFS SR from FreeNAS VM hack

I have a Citrix xenserver 6.5 host which hosts a FreeNAS VM that exports an NFS share. I then have that same xenserver host use that NFS export as a SR for other VMs on that same server. It’s unusual, but it saves me from buying a separate server for VM storage.

The problem is if you reboot the hypervisor it will fail to connect to the NFS export (because the VM hosting it hasn’t booted yet.) Additionally it appears Xenserver does not play well at all with hung NFS mounts. If you try to shutdown or reboot your FreeNAS VM while Xenserver is still using its NFS export, things start to freeze. You will be unable to do anything to any of your VMs thanks to the hung NFS share. It’s a problem!

My hack around this mess is to have FreeNAS, not Xenserver, control starting and stopping these VMs.

First, create public/private key pair for ssh into xenserver

ssh-keygen

This will generate two files, a private key file and a public (.pub) file. Copy the contents of the .pub file into the xenserver’s authorized_keys file:

echo "PUT_RSA_PUBLIC_KEY_HERE" >> /root/.ssh/authorized_keys

Copy the private key file (same name but without .pub extension) somewhere on your FreeNAS VM.

Next, create NFS startup and shutdown scripts. Thanks to linuxcommando for some guidance with this.  Replace the -i argument with the path to your SSH private key file generated earlier. You will also need to know the PBD UUID of the NFS store. Discover this by issuing

xe pbd-list

Copy the UUID for use in the scripts.

vi nfs-startup.sh
#!/bin/bash
#NFS FreeNAS VM startup script

SSH_COMMAND="ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i <PRIVATE_KEY_LOCATION> -l root <ADDRESS_OF_XENSERVER>"

#Attach NFS drive first, then start up NFS-reliant VMs
$SSH_COMMAND xe pbd-plug uuid=<UUID_COPIED_FROM_ABOVE>

sleep 10

#Issue startup commands for each of your NFS-based VMs, repeat for each VM you have
$SSH_COMMAND xe vm-start vm="VM_NAME"
...
vi nfs-shutdown.sh
#!/bin/bash
#NFS FreeNAS VM shutdown script
#Shut down NFS-reliant VMs, detach NFS SR

#Re-establish networking to work around the fact that Network goes down before this script is executed within FreeNAS
/sbin/ifconfig -l | /usr/bin/xargs -n 1 -J % /sbin/ifconfig % up
SSH_COMMAND="ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i <PRIVATE_KEY_LOCATION> -l root <ADDRESS_OF_XENSERVER>"

#Issue shutdown commands for each of your VMs
$SSH_COMMAND xe vm-shutdown vm="VM_NAME"

sleep 60

$SSH_COMMAND xe pbd-unplug <UUID_OF_NFS_SR>

#Take the networking interfaces back down for shutdown
/sbin/ifconfig -l | /usr/bin/xargs -n 1 -J % /sbin/ifconfig % down

Don’t forget to mark them executable:

chmod +x nfs-startup.sh
chmod +x nfs-shutdown.sh

Now add the scripts as a startup task in FreeNAS  and shutdown task respectively by going to System / Init/Shutdown Scripts. For startup, Select Type: Script, Type: postinit and point it to your nfs-startup.sh script. For shutdown, select Type: Script and Type: Shutdown.

Success! Now whenever your FreeNAS VM is shut down or rebooted, things will be handled properly which will prevent your hypervisor from freezing.

 

Improve FreeNAS NFS performance in Xenserver

My home lab consists of a virtualized instance of freenas, Citrix Xenserver, and various VMs. Recently I wanted to migrate some of my VMs to an NFS export from FreeNAS. To my dismay, the speed was abysmal (3 MB/second write speeds.) This tutorial will walk you through how to improve FreeNAS NFS performance in Xenserver by adding an log device (ZIL) to your ZFS pool.

After much research I realized the problem lies with ZFS behind the NFS export. Xenserver mounts the NFS share in such a way that it constantly wants to synchronize writes, which slows things down.

The solution: add a ZIL device. Since my freeNAS is virtualized, I chose the route of adding a virtual disk that is attached to an SSD. This process wasn’t straightforward.  If you have a virtual FreeNAS this is how to improve NFS performance:

  1. Add a disk in xenserver. Rule of thumb for size is half the amount of system RAM. I added 16GB ZIL disk to be safe.
  2. Add the following tunables in FreeNAS (to allow the OS to properly see xen hard drives)
    1. hint.ada.0.at, scbus100 (for the FreeNAS OS disk)
    2. hint.ada.1.at, scbus100 (for the newly added ZIL disk)
  3. Reboot FreeNAS
  4. In the FreeNAS GUI, click the ZFS Volume Manager, select your volume to expand from the dropdown, and select the device to be a LOG volume (ZIL)

That’s it! Once I added an SSD based ZIL device for my ZFS pool, NFS writes went from 3 MB/s to 60 MB/s. Awesome.