Tag Archives: Threadripper

KVM with vga passthrough in arch linux

I’ve once again switched from Proxmox to Arch Linux for my desktop machine. Both use KVM so it’s really just a matter of using the different VM manager syntax (virt-manager vs qm.) I used my notes from my previous stint with Arch, my article on GPU Passthrough in Proxmox as well as a thorough reading of the Arch wiki’s PCI Passthrough article.

Enable IOMMU

Configure GRUB to load the necessary iommu modules at boot. Append amd_iommu=on iommu=pt to the end of GRUB_CMDLINE_LINUX_DEFAULT (change accordingly if you have Intel instead of AMD)

sudo vim /etc/default/grub
...
GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 amd_iommu=on iommu=pt"

Run update-grub

sudo update-grub

Reserve GPU for VFIO

Reserve the GPU you wish to pass through to a VM for use with the vfio kernel driver (so the host OS doesn’t interfere with it)

  1. Determine the PCI address of your GPU
    1. Run lspci -v and look for your card. Mine was 01:00.0 & 01:00.1. You can omit the part after the decimal to include them both in one go – so in that case it would be 01:00
    2. Run lspci -n -s <PCI address from above> to obtain vendor IDs.
      Example :
      lspci -n -s 01:00
      01:00.0 0300: 10de:1b81 (rev a1)
      01:00.1 0403: 10de:10f0 (rev a1)
  2. Assign your GPU to vfio driver using the IDs obtained above.
    Example using above IDs:
    echo "options vfio-pci ids=10de:1b81,10de:10f0" >> /etc/modprobe.d/vfio.conf

Reboot the host to put the kernel / drivers into effect.

Configure virt-manager

Install virt-manager, dnsmasq & libvirtd:

pacman -Sy libvirtd virt-manager dnsmasq
sudo systemctl enable libvirtd
sudo systemctl start libvirtd

Configure Networking

Assuming you’re using network manager for your connections, create a bridge (thanks to ciberciti.biz & the arch wiki for information on how to do so.) Replace interface names with ones corresponding to your machine:

sudo nmcli connection add type bridge ifname br0 stp no
sudo nmcli connection add type bridge-slave ifname enp4s0 master br0 
sudo nmcli connection show
#Make note of the active connection name
sudo nmcli connection down "Wired connection 2" #from above
sudo nmcli connection up bridge-br

Create a second bridge bound to lo0 for host-only communication. Change IP as desired:

sudo nmcli connection add type bridge ifname br99 stp no ip4 192.168.2.1/24
sudo nmcli connection add type bridge-slave ifname lo master br99 
sudo nmcli connection up bridge-br99

Configure VM

Initial configuration

When creating the passthrough VM, make sure chipset is Q35.

Set the CPU model to host-passthrough (type it in, there is no dropdown for it.)

When adding disks / other devices, set the device model to virtio

Add your GPU by going to Add Hardware and finding it under PCI Host Device.

Windows 10 specific tweaks

If your passthrough VM is going to be windows based, some tweaks are required to get the GPU to work properly within the VM.

Ignore MSRs (blue screen fix)

Later versions of Windows 10 instantly bluescreen with kmode_exception_not_handled unless you pass an option to ignore MSRs. Add the kvm ignore_msrs=1 option in /etc/modprobe.d/kvm.conf to do so. Optionally add the report_ignored_msrs=0 option to squelch massive amounts of kernel messages every time an MSR was ignored.

echo "options kvm ignore_msrs=1" >> /etc/modprobe.d/kvm.conf
#Optional - ignore kernel messages from ignored MSRs
echo "options kvm report_ignored_msrs=0" >> /etc/modprobe.d/kvm.conf

Reboot to make those changes take effect.

NVIDIA Code 43 workaround

Use the virsh edit command to make some tweaks to the VM configuration. We need to hide the fact that this is a VM otherwise the GPU drivers will not load and will throw Error 43. We need to add a vendor_id in the hyperv section, and create a kvm section enabling hidden state, which hides certain CPU flags that the drivers use to detect if they’re in a VM or not.

sudo virsh edit <VM_NAME>

<features>
	<hyperv>
		...
		<vendor_id state='on' value='1234567890ab'/>
		...
	</hyperv>
	...
	<kvm>
	<hidden state='on'/>
	</kvm>
</features>

Optimize CPU

If you operate on a multi-core system such as my AMD Ryzen Threadripper the you will want to optimize your CPU core configuration in the VM per the CPU Pinning section in the Arch Wiki

Determine your CPU topology by running lscpu -e. The important things to look for are the CPU number and core number. On my box, it looks like this:

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ
0 0 0 0 0:0:0:0 yes 3400.0000 2200.0000
1 0 0 1 1:1:1:0 yes 3400.0000 2200.0000
2 0 0 2 2:2:2:0 yes 3400.0000 2200.0000
3 0 0 3 3:3:3:0 yes 3400.0000 2200.0000
4 0 0 4 4:4:4:1 yes 3400.0000 2200.0000
5 0 0 5 5:5:5:1 yes 3400.0000 2200.0000
6 0 0 6 6:6:6:1 yes 3400.0000 2200.0000
7 0 0 7 7:7:7:1 yes 3400.0000 2200.0000
8 0 0 8 8:8:8:2 yes 3400.0000 2200.0000
9 0 0 9 9:9:9:2 yes 3400.0000 2200.0000
10 0 0 10 10:10:10:2 yes 3400.0000 2200.0000
11 0 0 11 11:11:11:2 yes 3400.0000 2200.0000
12 0 0 12 12:12:12:3 yes 3400.0000 2200.0000
13 0 0 13 13:13:13:3 yes 3400.0000 2200.0000
14 0 0 14 14:14:14:3 yes 3400.0000 2200.0000
15 0 0 15 15:15:15:3 yes 3400.0000 2200.0000
16 0 0 0 0:0:0:0 yes 3400.0000 2200.0000
17 0 0 1 1:1:1:0 yes 3400.0000 2200.0000
18 0 0 2 2:2:2:0 yes 3400.0000 2200.0000
19 0 0 3 3:3:3:0 yes 3400.0000 2200.0000
20 0 0 4 4:4:4:1 yes 3400.0000 2200.0000
21 0 0 5 5:5:5:1 yes 3400.0000 2200.0000
22 0 0 6 6:6:6:1 yes 3400.0000 2200.0000
23 0 0 7 7:7:7:1 yes 3400.0000 2200.0000
24 0 0 8 8:8:8:2 yes 3400.0000 2200.0000
25 0 0 9 9:9:9:2 yes 3400.0000 2200.0000
26 0 0 10 10:10:10:2 yes 3400.0000 2200.0000
27 0 0 11 11:11:11:2 yes 3400.0000 2200.0000
28 0 0 12 12:12:12:3 yes 3400.0000 2200.0000
29 0 0 13 13:13:13:3 yes 3400.0000 2200.0000
30 0 0 14 14:14:14:3 yes 3400.0000 2200.0000
31 0 0 15 15:15:15:3 yes 3400.0000 2200.0000

From the above output I see my CPU core 0 is shared by CPUs 0 & 16, meaning CPU 0 and CPU 16 (as seen by the Linux kernel) are hyperthreaded to the same physical CPU core.

Especially for gaming, you want to keep all threads on the same CPU cores (for multithreading) and the same CPU die (on my threadripper, CPUs 0-7 reside on one physical die, and CPUs 8-15 reside on the other, within the same socket.)

In my case I want to dedicate one CPU die to my VM with its accompanying hyperthreads (CPUs 0-7 & hyperthreads 16-23) You can accomplish this using the virsh edit command and creating a cputune section (make sure you have a matching vcpu count for the number of cores you’re configuring.) Also edit CPU mode with the proper topology of 1 socket, 1 die, 8 cores with 2 threads.

sudo virsh edit <VM_NAME>

<domain type='kvm'>
  ...
  <vcpu placement='static'>16</vcpu>  
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='16'/>
    <vcpupin vcpu='2' cpuset='1'/>
    <vcpupin vcpu='3' cpuset='17'/>
    <vcpupin vcpu='4' cpuset='2'/>
    <vcpupin vcpu='5' cpuset='18'/>
    <vcpupin vcpu='6' cpuset='3'/>
    <vcpupin vcpu='7' cpuset='19'/>
    <vcpupin vcpu='8' cpuset='4'/>
    <vcpupin vcpu='9' cpuset='20'/>
    <vcpupin vcpu='10' cpuset='5'/>
    <vcpupin vcpu='11' cpuset='21'/>
    <vcpupin vcpu='12' cpuset='6'/>
    <vcpupin vcpu='13' cpuset='22'/>
    <vcpupin vcpu='14' cpuset='7'/>
    <vcpupin vcpu='15' cpuset='23'/>
  </cputune>
  ...
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' dies='1' cores='8' threads='2'/>
  </cpu>
  ...
</domain>

I found here I’ve been doing the vcpupins wrong – apparently the better way to do it is to do primary CPU thread, then hyperthread CPU. I had been doing all the main CPUs first, then their hyperthreads. Once I change to the every-other configuration stuttering disappeared.

Update 6/28/20: Additional tuning since I was having some stuttering and framerate issues

Dedicate CPUs to the VM (host will not use them) – append isolcups, nohzZ_full & rcu_nocbs kernel parameters into /etc/default/grub

...
GRUB_CMDLINE_LINUX_DEFAULT=... isolcpus=0-7,16-23 nohz_full=0-7,16-23 rcu_nocbs=0-7,16-23
...

Update grub:

sudo grub-mkconfig -o /boot/grub/grub.cfg

Reboot, then check if it worked:

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/arch-root rw loglevel=3 amd_iommu=on iommu=pt isolcpus=0-7,16-23 nohz_full=0-7,16-23 rcu_nocbs=0-7,16-23
taskset -cp 1
pid 1's current affinity list: 8-15,24-31

You can still tell programs to use the CPUs the VM has manually with the taskset command:

chrt -r 1 taskset -c <cores to use> <name of program/process>

Change CPU frequency setting to use performance mode:

sudo pacman -S cpupower
sudo cpupower frequency-set -g performance

Profit

I’m very pleased with my current setup. It works well!

Threadripper / Epyc processor core optimization

I had a pet project (folding@home) where I wanted to maximize computing power. I became frustrated with default CPU scheduling of my folding@home threads. Ideal performance would keep similar threads on the same CPU, but the threads were jumping all over the place, which was impacting performance.

Step one was to figure out which threads belonged to which physical cores. I found on this site that you can use cat to find out what your “sibling threads” are:

cat /sys/devices/system/cpu/cpu{0..15}/topology/thread_siblings_list

The above command is for my Threadripper & Epyc systems, which each have 16 cores hyperthreaded to 32 cores. Adjust the {0..15} number to match your number of cores (core 0 being the fist core.) This was my output:

cat /sys/devices/system/cpu/cpu{0..15}/topology/thread_siblings_list

0,16
1,17
2,18
3,19
4,20
5,21
6,22
7,23
8,24
9,25
10,26
11,27
12,28
13,29
14,30
15,31

Now that I know the sibling threads are offset by 16, I can use this information to optimize my folding@home VMs. I modified my CPU pinning script to take this into consideration. The script ensures that each VM is pinned to only use sibling threads (ensuring they all stay on the same physical CPU.)

This script should be used with caution. It pins processes to specific CPUs, which limits the kernel scheduler’s ability to move things around if needed. If configured badly this can cause the machine to lock up or VMs to be terminated.

I saw some impressive results spinning up four separate 8 core VMs and pinning them to sibling cores using this script. It almost doubled the rate at which I completed folding@home work units.

And now, the script:

#!/bin/bash
#Properly assign CPU cores to their respective die for EPYC/Threadripper systems
#Based on how hyperthreads are done in these systems
#cat /sys/devices/system/cpu/cpu{0..15}/topology/thread_siblings_list

#The script takes two arguments - the ID of the Proxmox VM to modify, and the core to begin the VM on
#If running this against multiple VMs, make sure to increment this second number by half of the cores of the previous VM
#For example, if I have one 8 core VM and I run this script specifying 0 for the offset, if I spin up a second VM, the second argument would be 4
#this would ensure the second VM starts on core 4 (the 5th core) and assigns sibling cores to match

set -eo pipefail

#take First argument as which VMID to pin CPU cores to, the second argument is which core to start pinning to
VMID=$1
OFFSET=$2

#Determine offset for sibling threads
SIBLING_THREAD_OFFSET=$(cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list| sed 's/,/ /g' | awk '{print $2}')

#Function to determine number of CPU cores a VM has
cpu_tasks() {
	expect <<EOF | sed -n 's/^.* CPU .*thread_id=\(.*\)$/\1/p' | tr -d '\r' || true
spawn qm monitor $VMID
expect ">"
send "info cpus\r"
expect ">"
EOF
}

#Only act if VMID & OFFSET are set
if [[ -z $VMID  || -z $OFFSET ]]
then
	echo "Usage: cpupin.sh <VMID> <OFFSET>"
	exit 1
else
	#Get PIDs of each CPU core for VM, count number of VM cores, and get even/odd PIDs for assignment
	VCPUS=($(cpu_tasks))
	VCPU_COUNT="${#VCPUS[@]}"
	VCPU_EVEN_THREADS=($(for EVEN_THREAD in "${VCPUS[@]}"; do echo $EVEN_THREAD; done | awk '!(NR%2)'))
	VCPU_ODD_THREADS=($(for ODD_THREAD in "${VCPUS[@]}"; do echo $ODD_THREAD; done | awk '(NR%2)'))

	if [[ $VCPU_COUNT -eq 0 ]]; then
		echo "* No VCPUS for VM$VMID"
		exit 1
	fi

	echo "* Detected ${#VCPUS[@]} assigned to VM$VMID..."
	echo "* Resetting cpu shield..."

	#Start at offset CPU number, assign odd numbered PIDs to their own CPU thread, then increment CPU core number
	#0-3 if offset is 0, 4-7 if offset is 4, etc
	ODD_CPU_INDEX=$OFFSET
	for PID in "${VCPU_ODD_THREADS[@]}"
	do
		echo "* Assigning ODD thread $ODD_CPU_INDEX to $PID..."
		taskset -pc "$ODD_CPU_INDEX" "$PID"
		((ODD_CPU_INDEX+=1))
	done

	#Start at offset + CPU count, assign even number PIDs to their own CPU thread, then increment CPU core number
	#16-19 if offset is 0,	20-23 if offset is 4, etc
	EVEN_CPU_INDEX=$(($OFFSET + $SIBLING_THREAD_OFFSET))
	for PID in "${VCPU_EVEN_THREADS[@]}"
	do
		echo "* Assigning EVEN thread $EVEN_CPU_INDEX to $PID..."
		taskset -pc "$EVEN_CPU_INDEX" "$PID"
		((EVEN_CPU_INDEX+=1))
	done
fi

VGA Passthrough with Threadripper

An unfortunate bug exists for the AMD Threadripper family of GPUs which causes VGA Passthrough not to work properly. Fortunately some very clever people have implemented a workaround to allow proper VGA passthrough until a proper Linux Kernel patch can be accepted and implemented. See here for the whole story.

Right now my Thrdearipper 1950x successfully has GPU passthrough thanks to HyenaCheeseHeads “java hack” applet.  I went this route because I really didn’t want to try and recompile my ProxMox kernel to get passthrough to work. Per the description “It is a small program that runs as any user with read/write access to sysfs (this small guide assumes “root”). The program monitors any PCIe device that is connected to VFIO-PCI when the program starts, if the device disconnects due to the issues described in this post then the program tries to re-connect the device by rewriting the bridge configuration.” Instructions taken from the above Reddit post.

  • Go to https://pastebin.com/iYg3Dngs and hit “Download” (the MD5 sum is supposed to be 91914b021b890d778f4055bcc5f41002)
  • Rename the downloaded file to “ZenBridgeBaconRecovery.java” and put it in a new folder somewhere
  • Go to the folder in a terminal and type “javac ZenBridgeBaconRecovery.java”, this should take a short while and then complete with no errors. You may need to install the Java 8 JDK to get the javac command (use your distribution’s software manager)
  • In the same folder type “sudo java ZenBridgeBaconRecovery”
  • Make sure that the PCIe device that you intend to passthru is listed as monitored with a bridge
  • Now start your VM

In my case (Debian Stretch, ProxMox) I needed to install openjdk-8-jdk-headless

sudo apt install openjdk-8-jdk-headless
javac ZenBridgeBaconRecovery.java

Next I have a little script on startup to spawn this as root in a detached tmux session, so I don’t have to remember to run it (If you try to start your VM before running this, it will hose passthrough on your system until you reboot it.) Be sure to change the script to point to wherever you compiled ZenBridgeBaconRecovery

#!/bin/bash
cd /home/nicholas  #change me to suit your needs
sudo java ZenBridgeBaconRecovery

And here is the command I use to run on startup:

tmux new -d '/home/nicholas/passthrough.sh'

Again, be sure to modify the above to point to the path of wherever you saved the above script.

So far this works pretty well for me. I hate having to run a java process as sudo, but it’s better than recompiling my kernel.


Update 6/27/2018:  I’ve created a systemd service script for the ZenBaconRecovery file to run at boot. Here is my file, placed in
/etc/systemd/system/zenbridge.service:  (change your working directory to match the zenbridgebaconrecovery java file location. Don’t forget to do systemctl daemon-reload.)

[Unit] 
Description=Zen Bridge Bacon Recovery 
After=network.target 

[Service] 
Type=simple 
User=root 
WorkingDirectory=/home/nicholas 
ExecStart=/usr/bin/java ZenBridgeBaconRecovery 
Restart=on-failure # or always, on-abort, etc 

[Install] 
WantedBy=multi-user.target 
~

Update 8/18/2018 Finally solved for everyone!

Per an update on the reddit thread motherboard manufactures have finally put out BIOS updates that resolve the PCI passthrough problems. I updated my X399 Tachi to the latest version of its UEFI BIOS (3.20) and indeed PCI passthrough worked without any more wonky workarounds!