Tag Archives: ProxMox

ProxMox VMs reboot when switch is rebooted

I came across an interesting situation where if I rebooted my Ubiquiti UniFi Switch 24 for a firmware upgrade, all my VM hosts would reboot themselves. It turned out to be due to my having enabled HA in ProxMox. The hosts would temporarily lose connectivity to each other and begin to fence themselves off from the cluster. This caused HA to kill the VMs on those hosts. Then once connectivity was restored everything would eventually come back up.

The proper way to fix this would be to have multiple paths for each host to talk to each other, so if one switch goes down the cluster is still able to communicate. In my case, where I only have one switch, the “poor man’s fix” was to simply disable HA altogether during the switch reboot, as outlined here. Then, once the switch is back up, re-enable HA.

On each node, stop the pve-ha-lrm service. Once it’s stopped on all hosts, stop the pve-ha-crm service. Then reboot your switch.

After the switch is back up, start pve-ha-lrm on each node first, then pve-ha-crm (if it doesn’t auto start itself) to re-enable HA.

Using ProxMox as a NAS

Lately I’ve been very unhappy with latest FreeBSD causing reboots randomly during disk resilvering. I simply cannot tolerate random reboots of my fileserver. This fact combined with the migration of OpenZFS to the ZFS on Linux code base means it’s time for me to move from a FreeBSD based ZFS NAS to a Linux-based one.

Sadly there aren’t many options in this space yet. I wanted something where basic tasks were taken care of, like what FreeNAS does, but also supports ZFS. The solution I settled on was ProxMox, which is a hypervisor, but it also has ZFS support.

The biggest drawback of ProxMox vs FreeNAS is the GUI. There are some disk-related GUI options in ProxMox, but mostly it’s VM focused. Thus, I had to configure my required services via CLI.

Following are the settings I used when I configured my NAS to run ProxMox.

Repo setup

If you don’t want to pay for a proxmox license, change the PVE enterprise repository to the free version by modifying /etc/apt/sources.list.d/pve-enterprise.list to the following:

deb http://download.proxmox.com/debian/pve buster pve-no-subscription

Then run at apt update & apt upgrade.

Email alerts

Postfix configuration

Edit /etc/postfix/main.cf and tweak your mail server config as needed (relayhost). Restart postfix after editing:

systemctl restart postfix

Forward mail for root to your own email

Edit /etc/aliases and add an alias for root to forward to your desired e-mail address. Add this line:

root: YOUR_EMAIL_ADDRESS

Afterward run:

newaliases

ZFS configuration

Pool Import

Import the pool using the zpool import -f command (-f to force import despite having been active in a different system)

zpool import -f  

By default they’re imported into the main root directory (/). If you want to have them go to /mnt, use the zfs set mountpoint command:

zfs set mountpoint=/mnt/ 

Monitoring

Install and configure zfs-zed

apt install zfs-zed

Modify /etc/zfs/zed.d/zed.rc and uncomment ZED_EMAIL_ADDR, ZED_EMAIL_PROG, and ZED_EMAIL_OPTS. Edit them to suit your needs (default values work fine, they just need to be uncommented.) Optionally uncomment ZED_NOTIFY_VERBOSE and change to 1 if you want more verbose notices like what FreeNAS does (scrub notifications, for example.)

After modifying /etc/zfs/zed.d/zed.rc, restart zed:

systemctl restart zfs-zed

Scrubbing

By default ProxMox scrubs each of your datasets on the second Sunday of every month. This cron job is located in /etc/cron.d/zfsutils-linux. Modify to your liking.

Snapshot & Replication

There are many different snapshot & replication scripts out there. I landed on Sanoid. Thanks to SvennD for helping me grasp how to get it working.

Install sanoid :

#Install necessary packages
apt install debhelper libcapture-tiny-perl libconfig-inifiles-perl pv lzop mbuffer git
# Clone repo, build deb, install
git clone https://github.com/jimsalterjrs/sanoid.git cd sanoid
ln -s packages/debian . 
dpkg-buildpackage -uc -us 
apt install ../sanoid_*_all.deb 

Snapshots

Edit /etc/sanoid/sanoid.conf with a backup and retention schedule for each of your datasets. Example taken from sanoid documentation:

[data/home]
	use_template = production
[data/images]
	use_template = production
	recursive = yes
	process_children_only = yes
[data/images/win7]
	hourly = 4

#############################
# templates below this line #
#############################

[template_production]
        frequently = 0
        hourly = 36
        daily = 30
        monthly = 3
        yearly = 0
        autosnap = yes
        autoprune = yes

Once sanoid.conf is to your liking, create a cron job to launch sanoid every hour (sanoid determines whether any action is needed when executed.)

crontab -e
#Add this line, save and exit
0 * * * * /usr/sbin/sanoid --cron

Replication

syncoid (part of sanoid) easily replicates snapshots. The syntax is pretty straightforward:

syncoid <source> <destination> -r 
#-r means recursive and is optional

For remote locations specify a username@ before the ip/hostname, then a colon and the dataset name, for example:

syncoid root@10.0.0.1:sourceDataset localDataset -r

You can even have a remote source go to a different remote destination, which is pretty neat.

Other syncoid options of interest:

--debug  #for seeing everything happening, useful for logging
--exclude #Regular expression to exclude certain datasets
--src-bwlimit #Set an upload limit so you don't saturate your bandwidth
--quiet #don't output anything unless it's an error

Automate synchronization by placing the same syncoid command into a cronjob:

0 */4 * * * /usr/sbin/syncoid --exclude=bigdataset1 --source-bwlimit=1M --recursive pool/data root@192.168.1.100:pool/data
#if you don't want status emails when the cron job runs, add --quiet

NFS

Install the nfs-kernel-server package and specify your NFS exports in /etc/exports.

apt install nfs-kernel-server portmap

Example /etc/exports :

/mnt/example/DIR1 192.168.0.0/16(rw,sync,all_squash,anonuid=0,anongid=0)

Restart nfs-server after modifying your exports:

systemctl restart nfs-server

Samba

Install samba, configure /etc/samba/smb.conf, and add users.

apt install samba
systemctl enable smbd

/etc/samba/smb.conf syntax is fairly straightforward. See the samba documentation for more information. Example share configuration:

[exampleshare]
comment = Example share
path = /mnt/example
valid users = user1 user2
writable = yes

Add users to the system itself with the adduser command:

adduser user1

Add those same users to samba with the smbpasswd -a command. Example:

smbpasswd -a user1

Restart samba after making changes:

systemctl restart smbd

SMART monitoring

Taken from https://pve.proxmox.com/wiki/Disk_Health_Monitoring:

By default, smartmontools daemon smartd is active and enabled, and scans the disks under /dev/sdX and /dev/hdX every 30 minutes for errors and warnings, and sends an e-mail to root if it detects a problem. 

Edit the file /etc/smartd.conf to suit your needs. You can specify/exclude devices, smart attributes, etc there. See here for more information. Restart the smartd service after modifying.

UPS monitoring

apc-upsd was easiest for me to configure, so I went with it. Thanks to this blog for giving me the information to get started.

First, install apcupsd:

apt install apcupsd apcupsd-doc

As soon as it was installed my console kept getting spammed about IRQ issues. To stop these errors I stopped the apcupsd daemon:

 systemctl stop apcupsd

Now modify /etc/apcupsd/apcupssd.conf to suit your needs. The section I added for my CyberPower OR2200LCDRT2U was simply:

UPSTYPE usb
DEVICE

Then modify /etc/default/apcupsd to specify it’s configured:

#/etc/default/apcupsd
ISCONFIGURED=yes

After configuring, you can restart the apcupsd service

systemctl start apcupsd

To check the status of your UPS, you can run the apcaccess status command:

/sbin/apcaccess status

Log monitoring

Install Logwatch to monitor system events. Here is a good primer on all of Logwatch’s options.

apt install logwatch

Modify /usr/share/logwatch/default.conf/logwatch.conf to suit your needs. By default it runs daily (defined in /etc/cron.daily/00logwatch). I added the following lines for my config to filter out unwanted information:

Service = "-zz-disk_space"
Service = "-postfix"
Service = "vsmartd"
Service = "-zz-lm_sensors"

Manually run logwatch to get a preview of what you’ll see:

logwatch --range today --mailto YOUR_EMAIL_ADDRESS

Troubleshooting

ZFS-ZED not sending email

If ZED isn’t sending emails it’s likely due to an error in the config. For some reason default values still need to be uncommented for zed to work, even if left unaltered. Thanks to this post for the info.

Samba share access denied

If you get access denied when trying to write to a SMB share, double check the file permissions on the server level. Execute chmod / chown as appropriate. Example:

chown user1 -R /mnt/example/user1

Proxmox first VM boot delay workaround

My home lab has an NFS server for storage and a proxmox hypervisor connecting to it. If the power ever goes out for more than my UPS can handle, startup is a bit of a mess. My ProxMox server boots up much faster than my NFS server, so the result is no VMs start automatically (due to storage being unavailable) and I have to manually go in and start everything.

I found this bug report from 2015 which frustratingly doesn’t appear to have any traction to it. Ideally I could just tell the first VM to wait 5 minutes before turning on, and then trigger all the other VMs to turn on once the first one is up, but the devs don’t seem to want to address that issue. So, I got creative.

My solution was to alter the grub menu timeout before booting ProxMox. Simple but effective.

Edit /etc/default/grub and modify GRUB_TIMEOUT

#modify GRUB_TIMEOUT to your liking
GRUB_TIMEOUT=300

Then simply run update-grub

update-grub

Now my proxmox server waits 5 minutes before even booting the OS, by which time the NAS should be up and running. No more manual turning on of VMs after a power outage.

Fix Proxmox swapping issue

I recently had an issue with one of my Proxmox hosts where it would max out all swap and slow down to a crawl despite having plenty of physical memory free. After digging and tweaking, I found this post which directed to set the kernel swappiness setting to 0. More reading suggested I should set it to 1, which is what I did.

Append to /etc/sysctl.conf:

#Fix excessive swap usage
vm.swappiness = 1 

Apply settings with:

sysctl --system

This did the trick for me.

Recover files from ZFS snapshot of ProxMox VM

I recently needed to restore a specific file from one of my ProxMox VMs that had been deleted. I didn’t want to roll back the entire VM from a previous snapshot – I just wanted a single file from the snapshot. My snapshots are handled via ZFS using FreeNAS.

Since my VM was CentOS 7 it uses XFS, which made things a bit more difficult. I couldn’t find a way to crash-mount a read-only XFS snapshot – it simply resufed to mount, so I had to make everything read/write. Below is the process I used to recover my file:

On the FreeNAS server, find the snapshot you wish to clone:

sudo zfs list -t snapshot -o name -s creation -r DATASET_NAME

Next, clone the snapshot

sudo zfs clone SNAPSHOT_NAME CLONED_SNAPSHOT_NAME

Next, on a Linux box, use SSHFS to mount the snapshot:

mkdir Snapshot
sshfs -o allow_other user@freenas:/mnt/CLONED_SNAPSHOT_NAME Snapshot/

Now create a read/write loopback device following instructions found here:

sudo -i #easy lazy way to get past permissions issues
cd /path/to/Snapshot/folder/created/above
losetup -P -f VM_DISK_FILENAME.raw
losetup 
#Take note of output, it's likely set to /dev/loop0 unless you have other loopbacks

Note if your VM files are not in RAW format, extra steps will need to be taken in order to convert it to RAW format.

Now we have an SSH-mounted loopback device ready for mounting. Things are complicated if your VM uses LVM, which mine does (CentOS 7). Once the loopback device is set, lvscan should see the image’s logical volumes. Make the desired volume active

sudo lvscan
sudo lvchange -ay /dev/VG_NAME/LV_NAME

Now you can mount your volume:

mkdir Restore
mount /dev/VG_NAME/LV_NAME Restore/

Note: for XFS you must have read/write capability on the loopback device for this to work.

When you’re done, do your steps in reverse to unmount the snaspshot:

#Unmount snapshot
umount Restore
#Deactivate LVM
lvchange -an /dev/VG_NAME/LV_NAME
Remove loopback device
losetup -d /dev/loop0 #or whatever the loopback device was
#Unmount SSHfs mount to ZFS server
umount Snapshot

Finally, on the ZFS server, delete the snapshot:

sudo zfs destroy CLONED_SNAPSHOT_NAME

Troubleshooting

When I tried to mount the LVM partition at this point I got this error message:

mount: /dev/mapper/centos_plexlocal-root: can't read superblock

It ended up being because I was accidentally creating a read-only loopback device. I destroy the loopback device and re-created with write support and all was well.

Free up RAM after Proxmox live migration

I ran into an issue where after migrating a bunch of VMs off of one of  my hosts, the remaining VMs on it refused to turn on. Every time I tried the command would hang for a while and eventually error out with this message

TASK ERROR: start failed: command '/usr/bin/kvm -id <truncated>... ' failed: got timeout

I suspected this might be due to RAM use and sure enough the usage was too high for a system that didn’t have any VMs running on it.  I found here that I could run a command to flush the cache:

echo 3 > /proc/sys/vm/drop_caches

That caused the RAM usage to go down but the symptom of the VM not starting remained. I then saw the KSM sharing still had some memory in it. I decided to restart the KSM sharing service:

sudo systemctl restart ksmtuned

After running that the VM started!

CPU Pinning in Proxmox

Proxmox uses qemu which doesn’t implement CPU pinning by itself. If you want to limit a guest VM’s operations to specific CPU cores on the host you need to use taskset. It was a bit confusing to figure out but fortunately I found this gist by ayufan which handles it beautifully.

Save the following into taskset.sh and edit VMID to the ID of the VM you wish to pin CPUs to. Make sure you have the “expect” package installed.

#!/bin/bash

set -eo pipefail

VMID=200

cpu_tasks() {
	expect <<EOF | sed -n 's/^.* CPU .*thread_id=\(.*\)$/\1/p' | tr -d '\r' || true
spawn qm monitor $VMID
expect ">"
send "info cpus\r"
expect ">"
EOF
}

VCPUS=($(cpu_tasks))
VCPU_COUNT="${#VCPUS[@]}"

if [[ $VCPU_COUNT -eq 0 ]]; then
	echo "* No VCPUS for VM$VMID"
	exit 1
fi

echo "* Detected ${#VCPUS[@]} assigned to VM$VMID..."
echo "* Resetting cpu shield..."

for CPU_INDEX in "${!VCPUS[@]}"
do
	CPU_TASK="${VCPUS[$CPU_INDEX]}"
	echo "* Assigning $CPU_INDEX to $CPU_TASK..."
	taskset -pc "$CPU_INDEX" "$CPU_TASK"
done

Update 9/29/18: Fixed missing done at the end. Also if you want to offset which cores this script uses, you can do so by modifying  the $CPU_INDEX variable to do a bit of math, like so:

        taskset -pc "$[CPU_INDEX+16]"

The above adds 16 to each process ID, so instead of staring on thread 0 it starts on thread 16.

Run ProxMox in local mode when part of a quorum

When moving my desktop (which I had joined to my ProxMox cluster) to a new environment I found I couldn’t start any VMs because there was no quorum. I couldn’t even get it to run “pvecm expected 1” because corosync wouldn’t start. I would see these errors in the log:

Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.

I found on this forum post that you can tell proxmox to run in local mode (not trying to be in a quorum)

sudo systemctl stop pve-cluster
sudo /usr/bin/pmxcfs -l

Success! This caused my node to run in local node, which let me run my Windows gaming VM in another location.

VGA Passthrough with Threadripper

An unfortunate bug exists for the AMD Threadripper family of GPUs which causes VGA Passthrough not to work properly. Fortunately some very clever people have implemented a workaround to allow proper VGA passthrough until a proper Linux Kernel patch can be accepted and implemented. See here for the whole story.

Right now my Thrdearipper 1950x successfully has GPU passthrough thanks to HyenaCheeseHeads “java hack” applet.  I went this route because I really didn’t want to try and recompile my ProxMox kernel to get passthrough to work. Per the description “It is a small program that runs as any user with read/write access to sysfs (this small guide assumes “root”). The program monitors any PCIe device that is connected to VFIO-PCI when the program starts, if the device disconnects due to the issues described in this post then the program tries to re-connect the device by rewriting the bridge configuration.” Instructions taken from the above Reddit post.

  • Go to https://pastebin.com/iYg3Dngs and hit “Download” (the MD5 sum is supposed to be 91914b021b890d778f4055bcc5f41002)
  • Rename the downloaded file to “ZenBridgeBaconRecovery.java” and put it in a new folder somewhere
  • Go to the folder in a terminal and type “javac ZenBridgeBaconRecovery.java”, this should take a short while and then complete with no errors. You may need to install the Java 8 JDK to get the javac command (use your distribution’s software manager)
  • In the same folder type “sudo java ZenBridgeBaconRecovery”
  • Make sure that the PCIe device that you intend to passthru is listed as monitored with a bridge
  • Now start your VM

In my case (Debian Stretch, ProxMox) I needed to install openjdk-8-jdk-headless

sudo apt install openjdk-8-jdk-headless
javac ZenBridgeBaconRecovery.java

Next I have a little script on startup to spawn this as root in a detached tmux session, so I don’t have to remember to run it (If you try to start your VM before running this, it will hose passthrough on your system until you reboot it.) Be sure to change the script to point to wherever you compiled ZenBridgeBaconRecovery

#!/bin/bash
cd /home/nicholas  #change me to suit your needs
sudo java ZenBridgeBaconRecovery

And here is the command I use to run on startup:

tmux new -d '/home/nicholas/passthrough.sh'

Again, be sure to modify the above to point to the path of wherever you saved the above script.

So far this works pretty well for me. I hate having to run a java process as sudo, but it’s better than recompiling my kernel.


Update 6/27/2018:  I’ve created a systemd service script for the ZenBaconRecovery file to run at boot. Here is my file, placed in
/etc/systemd/system/zenbridge.service:  (change your working directory to match the zenbridgebaconrecovery java file location. Don’t forget to do systemctl daemon-reload.)

[Unit] 
Description=Zen Bridge Bacon Recovery 
After=network.target 

[Service] 
Type=simple 
User=root 
WorkingDirectory=/home/nicholas 
ExecStart=/usr/bin/java ZenBridgeBaconRecovery 
Restart=on-failure # or always, on-abort, etc 

[Install] 
WantedBy=multi-user.target 
~

Update 8/18/2018 Finally solved for everyone!

Per an update on the reddit thread motherboard manufactures have finally put out BIOS updates that resolve the PCI passthrough problems. I updated my X399 Tachi to the latest version of its UEFI BIOS (3.20) and indeed PCI passthrough worked without any more wonky workarounds!

Update /etc/hosts with current IP for ProxMox

ProxMox virtual environment is a really nice package for managing KVM and container visualization. One quirk about it is you need to have an entry in /etc/hosts that points to your system’s IP address, not 127.0.0.1 or 127.0.1.1. I wrote a little script to grab the IP of your specified interface and add it to /etc/hosts automatically for you. You may download it here or see below:

#!/bin/bash
#A simple script to update /etc/hosts with your current IP address for use with ProxMox virtual environment
#Author: Nicholas Jeppson
#Date: 4/25/2018

###Edit these variables to your environment###
INTERFACE="enp4s0" #the interface that has the IP you want to update hosts for
DNS_SUFFIX=""
###End variables section###

#Variables you shouldn't have to change
IP=$(ip addr show $INTERFACE |egrep 'inet '| awk '{print $2}'| cut -d '/' -f1)
HOSTNAME=$(hostname)

#Use sed to add IP to first line in /etc/hosts
sed -i "1s/^/$IP $HOSTNAME $HOSTNAME$DNS_SUFFIX\n/" /etc/hosts