Category Archives: Virtualization

Posts about hypervisors and virtualization

proxmox 6 NVIDIA GPU passthrough fix

I upgraded to ProxMox 6.0 and to my dismay my Windows VM suddenly began receiving the dreaded Code 43 error. After much digging I finally found this post on the ProxMox forums which outlines what needs to happen.

In my case, all I needed to do was tweak my machine type. There is no GUI option to do this, so it had to be done in the command line:

qm set <VM_ID> -machine pc-q35-3.1

That was all it took!

The forum also suggested a few other things if that didn’t work. I didn’t end up needing them but I’ll put them here in case it’s helpful:

Add args to your VM config file:

args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'

Add a few options to the CPU line:

cpu: host,hidden=1,flags=+pcid,hv-vendor-id=proxmox

With the above settings I also discovered there is no need to have x-vga=on anymore. This allows you to have both the regular VM console and your graphics card if you so desire.

Run startup / shutdown on every VM in PRoxmox HA group

I wanted to run a stop operation on all VMs in one of my HA groups in Proxmox and was frustrated to see there was no easy way to do so. I wrote a quick & dirty bash script that will let me start & stop all VMs within an HA group to do what I wanted.

#!/bin/bash
#Proxmox HA start/stop script
#Takes first argument of the operation to do (start / stop) and any additional arguments for which HA group(s) to do it on, then acts as requested.

if [[ "$1" != "start" && "$1" != "stop" ]]; then
    echo "Please provide desired state (start | stop)"
    exit 1
fi

if [ "$1" == "start" ]; then
    VM_STATE="started"
    OPERATION="Starting"
elif [ "$1" == "stop" ]; then
    VM_STATE="stopped"
    OPERATION="Stopping"
else exit 1 #should not ever get here
fi

#Loop through each argument except for the first
for group in "${@:-1}" 
do
    group_members=$(ha-manager config | grep -B1 $group | grep vm: )
    for VM in $group_members
    do
        echo "$OPERATION $VM in HA group $group"
        ha-manager set $VM --state $VM_STATE
    done
done

proxmox suspend & resume scripts

Update 12/17/2019: Added logic to wait for VM to be suspended before suspending the shypervisor

Update 12/8/2019: After switching VMs I needed to tweak the pair of scripts. I modified it to make all the magic happen on the hypervisor; the VM simply needs to SSH into the hypervisor and call the script. The hypervisor now also needs access to SSH via public key to the VM to tell it to suspend.

#!/bin/sh
#ProxMox suspend script part 1 of 2
#To be run on the VM 
#All this does is call the suspend script on the hypervisor
#This could also just be a bash alias

####### Variables #########
HYPERVISOR=        #Name / IP of the hypervisor
SSH_USER=          #User to SSH into hypervisor as
HYPERVISOR_SCRIPT= #Path to part 2 of the script on the hypervisor

####### End Variables ######

#Execute server suspend script
ssh $SSH_USER@$HYPERVISOR "$HYPERVISOR_SCRIPT" &
#!/bin/bash
#ProxMox suspend script part 2 of 2
#Script to run on the hypervisor, it waits for VM to suspend and then suspends itself
#It relies on passwordless sudo configured on the VM as well as SSH keys to allow passwordless SSH access to the VM from the hypervisor
#It resumes the VM after it resumes itself
#Called from the VM

########### Variables ###############

VM=             #Name/IP of VM to SSH into
VM_SSH_USER=    #User to ssh into the vm with
VMID=           #VMID of VM you wish to suspend

########### End Variables############

#Tell guest VM to suspend
ssh $VM_SSH_USER@$VM "sudo systemctl suspend"

#Wait until guest VM is suspended, wait 5 seconds between attempts
while [ "$(qm status $VMID)" != "status: suspended" ]
do 
    echo "Waiting for VM to suspend"
    sleep 5 
done

#Suspend hypervisor
systemctl suspend

#Resume after shutdown
qm resume $VMID

I have a desktop running ProxMox. My GUI is handled via a virtual machine with physical hardware passed through it. The challenge with this setup is getting suspend & resume to work properly. I got it to work by suspending the VM first, then the host; on resume, I power up the host first, then resume the VM. Doing anything else would cause hardware passthrough problems that would force me to reboot the VM.

I automated the suspend process by using two scripts: one for the VM, and one for the hypervisor. The first script is run on the VM. It makes an SSH command to the hypervisor (thanks to this post) to instruct it to run the second half of the script; then initiates a suspend of the VM.

The second half of the script waits a few seconds to allow the VM to suspend itself, then instructs the hypervisor to also go into suspend. I had to split these into two scripts because once the VM is suspended, it can’t issue any more commands. Suspending the hypervisor must happen after the VM itself is suspended.

Here is script #1 (to be run on the VM) It assumes you have already set up a private/public key pair to allow for passwordless login into the hypervisor from the VM.

#!/bin/sh
#ProxMox suspend script part 1 of 2
#Tto be run on the VM so it suspends before the hypervisor does

####### Variables #########
HYPERVISOR=HYPERVISOR_NAME_OR_IP
SSH_USER=SSH_USER_ON_HYPERVISOR
HYPERVISOR_SCRIPT_LOCATION=NAME_AND_LOCATION_OF_PART2_OF_SCRIPT

####### End Variables ######

#Execute server suspend script, then suspend VM
ssh $SSH_USER@$HYPERVISOR  $HYPERVISOR_SCRIPT_LOCATION &

#Suspend
systemctl suspend

Here is script #2 (which script #1 calls), to be run on the hypervisor

#!/bin/bash
#ProxMox suspend script part 2 of 2
#Script to run on the hypervisor, it waits for VM to suspend and then suspends itself
#It resumes the VM after it resumes itself

########### Variables ###############

#Specify VMid you wish to suspend
VMID=VMID_OF_VM_YOU_WANT_TO_SUSPEND

########### End Variables############

#Wait 5 seconds before doing anything to allow for VM to suspend
sleep 5

#Suspend hypervisor
systemctl suspend

#Resume after shutdown
qm resume $VMID

It works on my machine 🙂

Primary VGA passthrough in ProxMox

I recently decided to amplify my VFIO experience by experimenting with passing my primary display adapter to a VM in proxmox. Previously I had just run tasksel on the proxmox host itself to install a GUI. I wanted better separation from the server side of proxmox and the client side. I also wanted to be able to distro-hop while maintaining the proxmox backend.

Initially I tried following my guide for passing through a secondary graphics card but ran into a snag. It did not work with my primary card and kept outputting these errors:

device vfio-pci,host=09:00.0,id=hostdev0,bus=pci.4,addr=0x0: Failed to mmap 0000:09:00.0 BAR 1. Performance may be slow

After much digging I finally found this post which explained I needed to unbind a few things for it to work properly:

echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind

After more searching I found this post on reddit which had a nifty script for automating this when VM startup is desired. I tweaked it a bit to suit my needs.

Find your IDs for GPU by doing lspci and looking for your adapter. Find the IDs by running lspci -n -s <GPU location discovered with lspci>. Lastly VMID is the promxox ID for the VM you wish to start.

#!/bin/sh
#Script to launch Linux desktop
#Adapted from from https://www.reddit.com/r/VFIO/comments/abfjs8/cant_seem_to_get_vfio_working_with_qemu/?utm_medium=android_app&utm_source=share

GPU=09:00
GPU_ID="10de 1c82"
GPU_AUDIO="10de 0fb9"
VMID=116

# Remove the framebuffer and console
echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind

# Unload the Kernel Modules that use the GPU
modprobe -r nvidia_drm
modprobe -r nvidia_modeset
modprobe -r nvidia
modprobe -r snd_hda_intel

# Load the vfio kernel module
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio-pci

#Assign card to vfio-pci
echo -n "${GPU_ID}" > /sys/bus/pci/drivers/vfio-pci/new_id
echo -n "${GPU_AUDIO}" > /sys/bus/pci/drivers/vfio-pci/new_id

#Start desktop
sudo qm start $VMID

#Wait here until the VM is turned off
while [ "$(qm status $VMID)" != "status: stopped" ] 
do
 sleep 5
done

#Reassign primary graphics card back to host
echo -n "0000:${GPU}.0" > /sys/bus/pci/drivers/vfio-pci/unbind
echo -n "0000:${GPU}.1" > /sys/bus/pci/drivers/vfio-pci/unbind
echo -n "${GPU_ID}" > /sys/bus/pci/drivers/vfio-pci/remove_id
echo -n "${GPU_AUDIO}" > /sys/bus/pci/drivers/vfio-pci/remove_id
rmmod vfio-pci
modprobe nvidia
modprobe nvidia_drm
modprobe nvidia_modeset
modprobe snd_hda_intel
sleep 1
echo -n "0000:${GPU}.0" > /sys/bus/pci/drivers/nvidia/bind
echo -n "0000:${GPU}.1" > /sys/bus/pci/drivers/snd_hda_intel/bind
sleep 1
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/bind
echo 1 > /sys/class/vtconsole/vtcon0/bind
echo 1 > /sys/class/vtconsole/vtcon1/bind

With my primary adapter passed through I realized I also want other things passed through, mainly USB. I tried Proxmox’s USB device passthrough options but it doesn’t work well with USB audio (stutters and choppy.) I wanted to pass through my whole USB controller to the VM.

This didn’t work as well as I had planned due to IOMMU groups. A great explanation of IOMMU groups can be found here. I had to figure out which of my USB controllers were in which IOMMU group to see if I could pass the whole thing through or not (some of them were in the same IOMMU group as SATA & network controllers, which I did not want to pass through to the VM.)

Fortunately I was able to discover which USB controllers I could safely pass through first by running lspci to see the device ID, then running find to see which IOMMU group it was in, then checking against lspci to see what other devices were in that group. The whole group comes over together when you pass through to a VM.

First determine the IDs of your USB controllers

lspci | grep USB

01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ba (rev 02)
08:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03)
0a:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 145c
43:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 145c

Next get which IOMMU group these devices belong to

find /sys/kernel/iommu_groups/ -type l|sort -h|grep '01:00.0\|08:00.0\|0a:00.3\|43:00.3'

/sys/kernel/iommu_groups/14/devices/0000:01:00.0
/sys/kernel/iommu_groups/15/devices/0000:08:00.0
/sys/kernel/iommu_groups/19/devices/0000:0a:00.3
/sys/kernel/iommu_groups/37/devices/0000:43:00.3

Then see what other devices use the same IOMMU group (the group is the number after /sys/kernel/iommu_groups/)

find /sys/kernel/iommu_groups/ -type l|sort -h | grep '/14\|/15\|/19\|/37'

/sys/kernel/iommu_groups/14/devices/0000:01:00.0
/sys/kernel/iommu_groups/14/devices/0000:01:00.1
/sys/kernel/iommu_groups/14/devices/0000:01:00.2
/sys/kernel/iommu_groups/14/devices/0000:02:00.0
/sys/kernel/iommu_groups/14/devices/0000:02:04.0
/sys/kernel/iommu_groups/14/devices/0000:02:05.0
/sys/kernel/iommu_groups/14/devices/0000:02:06.0
/sys/kernel/iommu_groups/14/devices/0000:02:07.0
/sys/kernel/iommu_groups/14/devices/0000:04:00.0
/sys/kernel/iommu_groups/14/devices/0000:05:00.0
/sys/kernel/iommu_groups/14/devices/0000:06:00.0
/sys/kernel/iommu_groups/15/devices/0000:08:00.0
/sys/kernel/iommu_groups/19/devices/0000:0a:00.3
/sys/kernel/iommu_groups/37/devices/0000:43:00.3

As you can see one of my USB controllers (01:00.0) has a whole bunch of stuff in its IOMMU group, so I don’t want to use it lest I bring all those other things into the VM with it. The other three, though, are isolated in their groups and thus are perfect for passthrough.

In my case I passed through 0a:00.3 & 43:00.3 as 08:00.0 is a PCI card I want passed through to my Windows VM. This passed through about 2/3 of the USB ports on my system to my guest VM.

Proxmox first VM boot delay workaround

My home lab has an NFS server for storage and a proxmox hypervisor connecting to it. If the power ever goes out for more than my UPS can handle, startup is a bit of a mess. My ProxMox server boots up much faster than my NFS server, so the result is no VMs start automatically (due to storage being unavailable) and I have to manually go in and start everything.

I found this bug report from 2015 which frustratingly doesn’t appear to have any traction to it. Ideally I could just tell the first VM to wait 5 minutes before turning on, and then trigger all the other VMs to turn on once the first one is up, but the devs don’t seem to want to address that issue. So, I got creative.

My solution was to alter the grub menu timeout before booting ProxMox. Simple but effective.

Edit /etc/default/grub and modify GRUB_TIMEOUT

#modify GRUB_TIMEOUT to your liking
GRUB_TIMEOUT=300

Then simply run update-grub

update-grub

Now my proxmox server waits 5 minutes before even booting the OS, by which time the NAS should be up and running. No more manual turning on of VMs after a power outage.

Fix Proxmox swapping issue

I recently had an issue with one of my Proxmox hosts where it would max out all swap and slow down to a crawl despite having plenty of physical memory free. After digging and tweaking, I found this post which directed to set the kernel swappiness setting to 0. More reading suggested I should set it to 1, which is what I did.

Append to /etc/sysctl.conf:

#Fix excessive swap usage
vm.swappiness = 1 

Apply settings with:

sysctl --system

This did the trick for me.

Gaming VM with graphics passthrough in Arch Linux

At one point I had KVM with GPU passthrough running in Arch Linux. I have since moved away from it back to ProxMox. Here are my notes I jotted down when I did this in Arch. Sorry these are just rough notes, I didn’t end up using Arch for long enough to turn this into a polished article.


pacman -Sy qemu netctl ovmf virt-manager

When creating VM, make sure chipset is Q35

CPU model host-passthrough (write it in)

Create VirtIO SCSI controller and attach drives to it

NIC device model: virtio

—- networking —-

Create bridge:

https://wiki.archlinux.org/index.php/Bridge_with_netctl

Copy /etc/netctl/examples/bridge to /etc/netctl/bridge

/etc/netctl/bridge
Description="Example Bridge connection"
Interface=br0
Connection=bridge
BindsToInterfaces=(enp4s0)
IP=dhcp

#Optional - give your system another IP for host-only networking
ExecUpPost="ip addr add 192.168.2.1/24 dev br0"
sudo netctl reenable bridge
sudo netctl restart bridge

In the VM add another network interface, also assign to br0. Manually specify IP in guest VM to match subnet specified above in ExecUpPost

https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF

Allow UEFI bios: https://wiki.archlinux.org/index.php/libvirt#UEFI_Support

sudo vim /etc/libvirt/qemu.conf

/etc/libvirt/qemu.conf
nvram = [
    "/usr/share/ovmf/x64/OVMF_CODE.fd:/usr/share/ovmf/x64/OVMF_VARS.fd"
]

sudo systemctl restart libvirtd

Edit VM hardware:

CLI: sudo virsh edit <vm name>

GUI: double click on VM, then click second icon fnom the left (little i bubble)  Add GPU this way

Nvidia GPU: need to do x otherwise code 43

<features>
	<hyperv>
		...
		<vendor_id state='on' value='whatever'/>
		...
	</hyperv>
	...
	<kvm>
	<hidden state='on'/>
	</kvm>
</features>

Hot add CD:

sudo virsh attach-disk <VM_NAME> <ISO LOCATION>  hdb –type cdrom

Add second NIC: https://jamielinux.com/docs/libvirt-networking-handbook/bridged-network.html

sudo virsh edit win10

<interface type="bridge">
   <source bridge="br1"/>
</interface>

CPU configuration

Current Allocation 16

Topology / manually set CPU topology

1 socket, 16 cores, 1 thread

<cputune>
<vcpupin vcpu=’0′ cpuset=’16’/>
<vcpupin vcpu=’1′ cpuset=’17’/>
<vcpupin vcpu=’2′ cpuset=’18’/>
<vcpupin vcpu=’3′ cpuset=’19’/>
<vcpupin vcpu=’4′ cpuset=’20’/>
<vcpupin vcpu=’5′ cpuset=’21’/>
<vcpupin vcpu=’6′ cpuset=’22’/>
<vcpupin vcpu=’7′ cpuset=’23’/>
<vcpupin vcpu=’8′ cpuset=’24’/>
<vcpupin vcpu=’9′ cpuset=’25’/>
<vcpupin vcpu=’10’ cpuset=’26’/>
<vcpupin vcpu=’11’ cpuset=’27’/>
<vcpupin vcpu=’12’ cpuset=’28’/>
<vcpupin vcpu=’13’ cpuset=’29’/>
<vcpupin vcpu=’14’ cpuset=’30’/>
<vcpupin vcpu=’15’ cpuset=’31’/>
</cputune>

Running Windows 10 on Linux using KVM with VGA Passthrough

 

 

--machine q35 \
--host-device 4b:00.0 --host-device 4b:00.1 \

https://medium.com/@calerogers/gpu-virtualization-with-kvm-qemu-63ca98a6a172

 

add usb ports. Doesn’t work if nothing’s in the port?

virsh edit win10

<hostdev mode=’subsystem’ type=’usb’ managed=’yes’>
<source>
<address bus=’3′ device=’2’/>
</source>
<address type=’usb’ bus=’0′ port=’2’/>
</hostdev>

Remove Tablet input device to get 4th USB passthrough option

 

 

 

Troubleshooting

internal error: Unknown PCI header type '127'

https://forum.level1techs.com/t/trouble-passing-though-an-rx-580-to-an-ubuntu-desktop-vm/123376/3

Threadripper PCI Reset bug: https://www.reddit.com/r/Amd/comments/7gp1z7/threadripper_kvm_gpu_passthru_testers_needed/

Error 43:

<features>
	<hyperv>
		...
		<vendor_id state='on' value='whatever'/>
		...
	</hyperv>
	...
	<kvm>
	<hidden state='on'/>
	</kvm>
</features>

Audio cuts out whenever microphone is used

I had a very odd issue where all sound disappeared in my Windows VM if the microphone was used. Even simply opening up audio properties and going to the Recording tab triggered this issue. Disabling / re-enabled Special Effects for the playback device brought it back until the microphone was accessed again.

I’m using USB sound card passed through to the VM for audio. It stems from the VM’s USB controller. When I had it set to USB3 the issue would occur. When set to USB2 the issue went away. Bizarre.

FreeNAS ZFS tuning for SSDs

I wanted to optimize my all SSD storage array on my FreeNAS server but I had a hard time finding information in one place. After a lot of digging I pulled things from several places. This is what I came up with. It boiled down to two main settings

  • ashift
  • recordsize

Checking ashift on existing pools

zdb -U /data/zfs/zpool.cache | grep ashift

I read here a recommended setting of ashift=13, recordsize=8k for VM workloads on SSDs.

How to change recordsize:

This is easily done in the GUI or command line and can be changed on the fly.

zfs set recordize <value> <volume>

How to change ashift:

Backup your data and destroy the pool.

Modify the setting dictating minimum ashift setting as outlined here

sysctl vfs.zfs.min_auto_ashift=13

Re-create the pool.

Note: this lustre wiki actually recommends ashift 12 instead of 13.

UPDATE 2/26/20: You can also set the ashift at pool creation time as dictated here

zpool create POOL_NAME -o ashift=12 ...

Additional reading

http://open-zfs.org/wiki/Performance_tuning#Alignment_shift
https://www.reddit.com/r/zfs/comments/7pfutp/zfs_pool_planning_for_vm_storage/

Free up RAM after Proxmox live migration

I ran into an issue where after migrating a bunch of VMs off of one of  my hosts, the remaining VMs on it refused to turn on. Every time I tried the command would hang for a while and eventually error out with this message

TASK ERROR: start failed: command '/usr/bin/kvm -id <truncated>... ' failed: got timeout

I suspected this might be due to RAM use and sure enough the usage was too high for a system that didn’t have any VMs running on it.  I found here that I could run a command to flush the cache:

echo 3 > /proc/sys/vm/drop_caches

That caused the RAM usage to go down but the symptom of the VM not starting remained. I then saw the KSM sharing still had some memory in it. I decided to restart the KSM sharing service:

sudo systemctl restart ksmtuned

After running that the VM started!

CPU Pinning in Proxmox

Proxmox uses qemu which doesn’t implement CPU pinning by itself. If you want to limit a guest VM’s operations to specific CPU cores on the host you need to use taskset. It was a bit confusing to figure out but fortunately I found this gist by ayufan which handles it beautifully.

Save the following into taskset.sh and edit VMID to the ID of the VM you wish to pin CPUs to. Make sure you have the “expect” package installed.

#!/bin/bash

set -eo pipefail

VMID=200

cpu_tasks() {
	expect <<EOF | sed -n 's/^.* CPU .*thread_id=\(.*\)$/\1/p' | tr -d '\r' || true
spawn qm monitor $VMID
expect ">"
send "info cpus\r"
expect ">"
EOF
}

VCPUS=($(cpu_tasks))
VCPU_COUNT="${#VCPUS[@]}"

if [[ $VCPU_COUNT -eq 0 ]]; then
	echo "* No VCPUS for VM$VMID"
	exit 1
fi

echo "* Detected ${#VCPUS[@]} assigned to VM$VMID..."
echo "* Resetting cpu shield..."

for CPU_INDEX in "${!VCPUS[@]}"
do
	CPU_TASK="${VCPUS[$CPU_INDEX]}"
	echo "* Assigning $CPU_INDEX to $CPU_TASK..."
	taskset -pc "$CPU_INDEX" "$CPU_TASK"
done

Update 9/29/18: Fixed missing done at the end. Also if you want to offset which cores this script uses, you can do so by modifying  the $CPU_INDEX variable to do a bit of math, like so:

        taskset -pc "$[CPU_INDEX+16]"

The above adds 16 to each process ID, so instead of staring on thread 0 it starts on thread 16.