Category Archives: Virtualization

Posts about hypervisors and virtualization

windows 10 KERNEL_SECURITY_CHECK_FAILED in qemu vm

I upgraded to a shiny new AMD Ryzen 3rd gen processer (Threadripper 3960x.) After doing so I could not boot up my Windows 10 gaming VM (it uses VFIO / PCI Passthrough for the video card.) The message I kept getting as it tried to boot was:

KERNEL_SECURITY_CHECK_FAILED

After reading this reddit thread and this one It turns out it’s a culmination of a few things:

  • Running Linux kernel greater than 5.4
  • Running QEMU 5
  • Using 3rd gen AMD Ryzen CPU
  • Using host-passthrough CPU mode

The problem comes with a new speculative execution protection hardware feature in the Ryzen Gen 3 chipsets – stibp. Qemu doesn’t know how to handle it properly, thus the bluescreens.

There are two ways to fix it

  • Change host-model from host-passthrough to epyc
  • Add CPU parameters to your Virtual Machine’s XML file instructing it to not use the stibp CPU feature.

Since I have some software that checks CPU model and refuses to work if it’s not in the desktop class (Geforce Experience) I opted for route #2.

First, check the qemu logs to see which CPU parameters your VM was using (pick a time where it worked.) Replace ‘win10’ with the name of your VM.

sudo cat /var/log/libvirt/qemu/win10.log | grep "\-cpu"

in my case, it was -cpu host,migratable=on,topoext=on,kvmclock=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vendor-id=1234567890ab,kvm=off \

Copy everything after -cpu and before the last backslash. Then edit your VM’s XML file (change last argument to the name of your VM)

sudo virsh edit win10

Scroll down to the bottom qemu:commandline section (if it doesn’t exist, create it right above the last line – </domain>. Paste the following information obtained from the above log (ignoring the qemu:commandline lines if they already exist.) In my case it looked like this:

  <qemu:commandline>
    <qemu:arg value='-cpu'/>
    <qemu:arg value='host,topoext=on,kvmclock=on,hv-time,hv-relaxed,hv-vapic,hv-
spinlocks=0x1fff,hv-vendor-id=1234567890ab,kvm=off,-amd-stibp'/>
  </qemu:commandline>

What you’re doing is copying the CPU arguments you found in the log and adding them to the qemu:commandline section, with a twist – adding -amd-stibp which instructs qemu to remove that CPU flag.

This did the trick for me!

Threadripper / Epyc processor core optimization

I had a pet project (folding@home) where I wanted to maximize computing power. I became frustrated with default CPU scheduling of my folding@home threads. Ideal performance would keep similar threads on the same CPU, but the threads were jumping all over the place, which was impacting performance.

Step one was to figure out which threads belonged to which physical cores. I found on this site that you can use cat to find out what your “sibling threads” are:

cat /sys/devices/system/cpu/cpu{0..15}/topology/thread_siblings_list

The above command is for my Threadripper & Epyc systems, which each have 16 cores hyperthreaded to 32 cores. Adjust the {0..15} number to match your number of cores (core 0 being the fist core.) This was my output:

cat /sys/devices/system/cpu/cpu{0..15}/topology/thread_siblings_list

0,16
1,17
2,18
3,19
4,20
5,21
6,22
7,23
8,24
9,25
10,26
11,27
12,28
13,29
14,30
15,31

Now that I know the sibling threads are offset by 16, I can use this information to optimize my folding@home VMs. I modified my CPU pinning script to take this into consideration. The script ensures that each VM is pinned to only use sibling threads (ensuring they all stay on the same physical CPU.)

This script should be used with caution. It pins processes to specific CPUs, which limits the kernel scheduler’s ability to move things around if needed. If configured badly this can cause the machine to lock up or VMs to be terminated.

I saw some impressive results spinning up four separate 8 core VMs and pinning them to sibling cores using this script. It almost doubled the rate at which I completed folding@home work units.

And now, the script:

#!/bin/bash
#Properly assign CPU cores to their respective die for EPYC/Threadripper systems
#Based on how hyperthreads are done in these systems
#cat /sys/devices/system/cpu/cpu{0..15}/topology/thread_siblings_list

#The script takes two arguments - the ID of the Proxmox VM to modify, and the core to begin the VM on
#If running this against multiple VMs, make sure to increment this second number by half of the cores of the previous VM
#For example, if I have one 8 core VM and I run this script specifying 0 for the offset, if I spin up a second VM, the second argument would be 4
#this would ensure the second VM starts on core 4 (the 5th core) and assigns sibling cores to match

set -eo pipefail

#take First argument as which VMID to pin CPU cores to, the second argument is which core to start pinning to
VMID=$1
OFFSET=$2

#Determine offset for sibling threads
SIBLING_THREAD_OFFSET=$(cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list| sed 's/,/ /g' | awk '{print $2}')

#Function to determine number of CPU cores a VM has
cpu_tasks() {
	expect <<EOF | sed -n 's/^.* CPU .*thread_id=\(.*\)$/\1/p' | tr -d '\r' || true
spawn qm monitor $VMID
expect ">"
send "info cpus\r"
expect ">"
EOF
}

#Only act if VMID & OFFSET are set
if [[ -z $VMID  || -z $OFFSET ]]
then
	echo "Usage: cpupin.sh <VMID> <OFFSET>"
	exit 1
else
	#Get PIDs of each CPU core for VM, count number of VM cores, and get even/odd PIDs for assignment
	VCPUS=($(cpu_tasks))
	VCPU_COUNT="${#VCPUS[@]}"
	VCPU_EVEN_THREADS=($(for EVEN_THREAD in "${VCPUS[@]}"; do echo $EVEN_THREAD; done | awk '!(NR%2)'))
	VCPU_ODD_THREADS=($(for ODD_THREAD in "${VCPUS[@]}"; do echo $ODD_THREAD; done | awk '(NR%2)'))

	if [[ $VCPU_COUNT -eq 0 ]]; then
		echo "* No VCPUS for VM$VMID"
		exit 1
	fi

	echo "* Detected ${#VCPUS[@]} assigned to VM$VMID..."
	echo "* Resetting cpu shield..."

	#Start at offset CPU number, assign odd numbered PIDs to their own CPU thread, then increment CPU core number
	#0-3 if offset is 0, 4-7 if offset is 4, etc
	ODD_CPU_INDEX=$OFFSET
	for PID in "${VCPU_ODD_THREADS[@]}"
	do
		echo "* Assigning ODD thread $ODD_CPU_INDEX to $PID..."
		taskset -pc "$ODD_CPU_INDEX" "$PID"
		((ODD_CPU_INDEX+=1))
	done

	#Start at offset + CPU count, assign even number PIDs to their own CPU thread, then increment CPU core number
	#16-19 if offset is 0,	20-23 if offset is 4, etc
	EVEN_CPU_INDEX=$(($OFFSET + $SIBLING_THREAD_OFFSET))
	for PID in "${VCPU_EVEN_THREADS[@]}"
	do
		echo "* Assigning EVEN thread $EVEN_CPU_INDEX to $PID..."
		taskset -pc "$EVEN_CPU_INDEX" "$PID"
		((EVEN_CPU_INDEX+=1))
	done
fi

create podman services with podman-compose

Podman is a fork of Docker that Redhat is using. I really liked docker-compose functionality; fortunately there is a podman-compose project which is more or less the same thing.

I now have a setup where each podman container is controlled by a systemd service, set to run on startup, with version controlled podman-compose files.

First, I installed podman-compose:

sudo curl -o /usr/local/bin/podman-compose https://raw.githubusercontent.com/containers/podman-compose/devel/podman_compose.py
chmod +x /usr/local/bin/podman-compose

I then created podman-compose files (syntax identical to docker-compose) for each container. Here is one example (jackett.yml)

---
version: "2"
services:
  jackett:
    image: linuxserver/jackett
    container_name: jackett
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Boise
    volumes:
      - /mnt/storage/Docker/Jackett/config:/config
      - /mnt/storage/Docker/Jackett/downloads:/downloads
    ports:
      - 9117:9117
    restart: unless-stopped

I then created a corresponding systemd unit file for each container:

#/etc/systemd/system/jackett.service
[Unit]
Description=Jackett
After=network.target

[Service]
Restart=always

# Compose up
ExecStart=/usr/local/bin/podman-compose -f /home/nicholas/podman/jackett.yml up

# Compose down, remove containers and volumes
ExecStop=/usr/local/bin/podman-compose -f /home/nicholas/podman/jackett.yml down -v

[Install]
WantedBy=multi-user.target

I then do a systemctl daemon-reload, and enable the service for startup:

sudo systemctl daemon-reload
sudo systemctl enable jackett

Success.

Why not create a single podman-compose file for all my services, instead of creating individual services for each container? I wanted to be able to clearly see log output for each container with journalctl -f -u <service name.> If you lump all your services in a single compose file, the output from each container gets all jumbled into that single service log. Separating out each container into its own service was more clean.

Podman no internet in container fix

I’ve started experimenting with CentOS 8 & Podman (a fork of Docker.) I ran into an issue where one of my containers needed internet access, but could not connect. After some digging I found this site which explains why:

I had to configure the firewall on the podman host to allow for IP masquerade:

sudo firewall-cmd --zone=public --add-masquerade --permanent

After running the above command, my container had internet access!

JQ select specific value from array

I had some AWS ec2 JSON output that I needed to parse. I wanted to grab a specific value from an array and it proved to be tricky for a JSON noob like me. I finally found this site which was very helpful: https://garthkerr.com/search-json-array-jq/. In my case I wanted the value of a specific AWS EC2 tag.

The trick is to grab down to the Tags[] array, and then pipe that to a select command. If your tags have dots in them (as mine did) then make sure to quote the tag name. Then add the .Value to the end of the select statement. This is my query:

jq -r '.Reservations[].Instances[].Tags[] | select (.Key == "EC2.Tag.Name").Value' jsonfile.json

The above query grabs all the tags (an array of Key,Value lines), then searches the result for a specific key “EC2.Tag.Name” and returns the Value line associated with it.

migrate between amd & intel in proxmox

I recently acquired an Intel based server and plugged it into my AMD-based Proxmox cluster. I ran into an issue transferring from AMD to Intel boxes (the other direction worked fine.) After a few moments, every VM that moved from AMD to Intel would kernel panic.

Fortunately I found here that the fix is to add a few custom CPU flags to your VMs. Once I did this they could move back and forth freely (assuming they had the kvm64 CPU assigned to them – host obviously won’t work.)

qm set *VMID* --args "-cpu 'kvm64,+ssse3,+sse4.1,+sse4.2,+x2apic'"

Proxmox HA management script

I was a bit frustrated at the lack of certain functions of ProxMox. I wanted an easy way to tag a VM and manage that tag as a group. My solution was to create HA groups for VMs with various functions. I can then manage the group and tell them to migrate storage or turn off & on.

I wrote a script to facilitate this. Right now it only covers powering on, powering off, and migrating the location of the primary disk, but more is to come.

Here’s what I have so far:

#!/bin/bash
#Proxmox HA management script
#Migrates storage, starts, or stops Proxmox HA groups based on the name and function passed to it.
#Usage: manage-HA-group.sh <start|stop|migrate> <ha-group-name> [local|remote]

#Change to the name of your local storage (for migrating from remote to local storage)
LOCAL_STORAGE_NAME="pve-1TB"

function get_vm_name() {
    #Determine the name of the VMID passed to this function
    VM_NAME=$(qm config "$1" | grep '^name:' | awk '{print $2}')
}

function get_group_VMIDs() {
    #Get a list of VMIDs based on the name of the HA group passed to this function
    group_VMIDs=$(ha-manager config | grep -B1 "$1" | grep vm: | sed 's/vm://g')
}

function group_power_state() {
    #Loop through members of HA group passed to this function
    for group in "$1" 
    do
        get_group_VMIDs "$group"
        for VM in $group_VMIDs
        do
            get_vm_name "$VM"
            echo "$OPERATION $VM_NAME in HA group $group"
            ha-manager set $VM --state $VM_STATE
        done
    done
}

function group_migrate() {
    #This function migrates the VM's first disk (scsi0) to the specified location (local/remote)
    #TODO String to determine all disk IDs:  qm config 115 | grep '^scsi[0-9]:' | tr -d ':' | awk '{print $1}'
    disk="scsi0"    

    #Loop through each VM in specified group name (second argument passed on CLI)
    for group in "$2" 
    do
        get_group_VMIDs "$group"
        for VM in $group_VMIDs
        do
            #Determine the names of each VM in the HA group
            get_vm_name "$VM"

            #Set storage location based on argument
            if [[ "$3" == "remote" ]]; then
                storage="$VM_NAME"
            else
                storage="$LOCAL_STORAGE_NAME"
            fi

            #Move primary disk to desired location
            echo "Migrating $VM_NAME to "$3" storage"
            qm move_disk $VM $disk $storage --delete=1

        done
    done
}

case "$1" in 
    start)
        VM_STATE="started"
        OPERATION="Starting"
        group_power_state "$2" 
        ;;
    stop)
        VM_STATE="stopped"
        OPERATION="Stopping"
        group_power_state "$2"
        ;;
    migrate)
        case "$3" in
            local|remote)
                group_migrate "$@"
                ;;
            *)
                echo "Usage: manage-HA-group.sh migrate <ha-group-name> <local|remote>"
                ;;
        esac        
    ;;
    *)
        echo "Usage: manage-HA-group.sh <start|stop|migrate> <ha-group-name> [local|remote]"
        exit 1
        ;;
esac

proxmox 6 NVIDIA GPU passthrough fix

I upgraded to ProxMox 6.0 and to my dismay my Windows VM suddenly began receiving the dreaded Code 43 error. After much digging I finally found this post on the ProxMox forums which outlines what needs to happen.

In my case, all I needed to do was tweak my machine type. There is no GUI option to do this, so it had to be done in the command line:

qm set <VM_ID> -machine pc-q35-3.1

That was all it took!

The forum also suggested a few other things if that didn’t work. I didn’t end up needing them but I’ll put them here in case it’s helpful:

Add args to your VM config file:

args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'

Add a few options to the CPU line:

cpu: host,hidden=1,flags=+pcid,hv-vendor-id=proxmox

With the above settings I also discovered there is no need to have x-vga=on anymore. This allows you to have both the regular VM console and your graphics card if you so desire.

Run startup / shutdown on every VM in PRoxmox HA group

I wanted to run a stop operation on all VMs in one of my HA groups in Proxmox and was frustrated to see there was no easy way to do so. I wrote a quick & dirty bash script that will let me start & stop all VMs within an HA group to do what I wanted.

#!/bin/bash
#Proxmox HA start/stop script
#Takes first argument of the operation to do (start / stop) and any additional arguments for which HA group(s) to do it on, then acts as requested.

if [[ "$1" != "start" && "$1" != "stop" ]]; then
    echo "Please provide desired state (start | stop)"
    exit 1
fi

if [ "$1" == "start" ]; then
    VM_STATE="started"
    OPERATION="Starting"
elif [ "$1" == "stop" ]; then
    VM_STATE="stopped"
    OPERATION="Stopping"
else exit 1 #should not ever get here
fi

#Loop through each argument except for the first
for group in "${@:-1}" 
do
    group_members=$(ha-manager config | grep -B1 $group | grep vm: )
    for VM in $group_members
    do
        echo "$OPERATION $VM in HA group $group"
        ha-manager set $VM --state $VM_STATE
    done
done

proxmox suspend & resume scripts

Update 12/17/2019: Added logic to wait for VM to be suspended before suspending the shypervisor

Update 12/8/2019: After switching VMs I needed to tweak the pair of scripts. I modified it to make all the magic happen on the hypervisor; the VM simply needs to SSH into the hypervisor and call the script. The hypervisor now also needs access to SSH via public key to the VM to tell it to suspend.

#!/bin/sh
#ProxMox suspend script part 1 of 2
#To be run on the VM 
#All this does is call the suspend script on the hypervisor
#This could also just be a bash alias

####### Variables #########
HYPERVISOR=        #Name / IP of the hypervisor
SSH_USER=          #User to SSH into hypervisor as
HYPERVISOR_SCRIPT= #Path to part 2 of the script on the hypervisor

####### End Variables ######

#Execute server suspend script
ssh $SSH_USER@$HYPERVISOR "$HYPERVISOR_SCRIPT" &
#!/bin/bash
#ProxMox suspend script part 2 of 2
#Script to run on the hypervisor, it waits for VM to suspend and then suspends itself
#It relies on passwordless sudo configured on the VM as well as SSH keys to allow passwordless SSH access to the VM from the hypervisor
#It resumes the VM after it resumes itself
#Called from the VM

########### Variables ###############

VM=             #Name/IP of VM to SSH into
VM_SSH_USER=    #User to ssh into the vm with
VMID=           #VMID of VM you wish to suspend

########### End Variables############

#Tell guest VM to suspend
ssh $VM_SSH_USER@$VM "sudo systemctl suspend"

#Wait until guest VM is suspended, wait 5 seconds between attempts
while [ "$(qm status $VMID)" != "status: suspended" ]
do 
    echo "Waiting for VM to suspend"
    sleep 5 
done

#Suspend hypervisor
systemctl suspend

#Resume after shutdown
qm resume $VMID

I have a desktop running ProxMox. My GUI is handled via a virtual machine with physical hardware passed through it. The challenge with this setup is getting suspend & resume to work properly. I got it to work by suspending the VM first, then the host; on resume, I power up the host first, then resume the VM. Doing anything else would cause hardware passthrough problems that would force me to reboot the VM.

I automated the suspend process by using two scripts: one for the VM, and one for the hypervisor. The first script is run on the VM. It makes an SSH command to the hypervisor (thanks to this post) to instruct it to run the second half of the script; then initiates a suspend of the VM.

The second half of the script waits a few seconds to allow the VM to suspend itself, then instructs the hypervisor to also go into suspend. I had to split these into two scripts because once the VM is suspended, it can’t issue any more commands. Suspending the hypervisor must happen after the VM itself is suspended.

Here is script #1 (to be run on the VM) It assumes you have already set up a private/public key pair to allow for passwordless login into the hypervisor from the VM.

#!/bin/sh
#ProxMox suspend script part 1 of 2
#Tto be run on the VM so it suspends before the hypervisor does

####### Variables #########
HYPERVISOR=HYPERVISOR_NAME_OR_IP
SSH_USER=SSH_USER_ON_HYPERVISOR
HYPERVISOR_SCRIPT_LOCATION=NAME_AND_LOCATION_OF_PART2_OF_SCRIPT

####### End Variables ######

#Execute server suspend script, then suspend VM
ssh $SSH_USER@$HYPERVISOR  $HYPERVISOR_SCRIPT_LOCATION &

#Suspend
systemctl suspend

Here is script #2 (which script #1 calls), to be run on the hypervisor

#!/bin/bash
#ProxMox suspend script part 2 of 2
#Script to run on the hypervisor, it waits for VM to suspend and then suspends itself
#It resumes the VM after it resumes itself

########### Variables ###############

#Specify VMid you wish to suspend
VMID=VMID_OF_VM_YOU_WANT_TO_SUSPEND

########### End Variables############

#Wait 5 seconds before doing anything to allow for VM to suspend
sleep 5

#Suspend hypervisor
systemctl suspend

#Resume after shutdown
qm resume $VMID

It works on my machine 🙂