migrate between amd & intel in proxmox

I recently acquired an Intel based server and plugged it into my AMD-based Proxmox cluster. I ran into an issue transferring from AMD to Intel boxes (the other direction worked fine.) After a few moments, every VM that moved from AMD to Intel would kernel panic.

Fortunately I found here that the fix is to add a few custom CPU flags to your VMs. Once I did this they could move back and forth freely (assuming they had the kvm64 CPU assigned to them – host obviously won’t work.)

qm set *VMID* --args "-cpu 'kvm64,+ssse3,+sse4.1,+sse4.2,+x2apic'"

Fix proxmox iommu operation not permitted

In trying to passthrough some LSI SAS cards to a VM I kept receiving this error:

kvm: -device vfio-pci,host=0000:03:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0,rombar=0: vfio 0000:03:00.0: failed to setup container for group 7: Failed to set iommu for container: Operation not permitted

I found on this post that the fix is to add a line to /etc/modprobe.d/vfio.conf with the following:

options vfio_iommu_type1 allow_unsafe_interrupts=1

then reboot the host. It worked in my case!

use zdb to remove pesky device from zfs pool

I had the following problem with a device in my pool:

config:

        NAME                                            STATE     READ WRITE CKSUM
        storage                                         DEGRADED     0     0     0
          mirror-0                                      ONLINE       0     0     0
            WORKING_DISK_1  ONLINE       0     0     0
            WORKING_DISK_2    ONLINE       0   0     0
          mirror-1                                      DEGRADED     0     0     0
            WORKING_DISK_3  ONLINE       0     0     0
            replacing-1                                 DEGRADED     0     0     0
              PROBLEM_DISK  FAULTED      6     1     0  too many errors

However when I tried to replace the drive I got this message:

no such device in pool

I found here that you can use zdb to obtain the GUID of the problem device and replace it that way:

root@nas:~# zdb -l PROBLEM_DISK
failed to unpack label 0
------------------------------------
LABEL 1
------------------------------------
    version: 5000
    name: 'storage'
    state: 0
    txg: 5675107
    pool_guid: 8785893899843624400
    errata: 0
    hostname: 'nas'
    top_guid: 9425730683443378041
    guid: 3449631978925631053
    vdev_children: 2
    vdev_tree:
        type: 'mirror'
        id: 1
        guid: 9425730683443378041
        metaslab_array: 41
        metaslab_shift: 35
        ashift: 12
        asize: 4000782221312
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 17168510556101954329
            path: 'WORKING_DISK_3'
            devid: 'WORKING_DISK_3_ID'
            phys_path: 'pci-0000:00:1f.2-ata-2'
            whole_disk: 1
            DTL: 14700
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
    ----->  guid: 3449631978925631053
            path: 'PROBLEM_DISK'
            devid: 'PROBLEM_DISK_ID'
            phys_path: 'pci-0000:00:1f.2-ata-4'
            whole_disk: 1
            DTL: 14699
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 1 2 3 

I used the guid of the problem disk, and all was well:

zpool replace storage 3449631978925631053 NEW_WORKING_DISK

worked instead of complaining the device I was trying to replace didn’t exist.

Add static route in CentOS7

I recently began a project of segmenting my LAN into various VLANs. One issue that cropped up had me banging my head against the wall for days. I had a particular VM that would use OpenVPN to a private VPN provider. I had that same system sending things to a file share via transmission-daemon.

Pre-subnet move everything worked, but once I moved my file server to a different subnet suddenly this VM could not access it while on the VPN. Transmission would hang for some time before finally saying

transmission-daemon.service: Failed with result 'timeout'.

The problem was since my file server was on a different subnet, it was trying to route traffic to it via the default gateway, which in this case was the VPN provider. I had to add a specific route to tell the server to use my LAN network instead of the VPN network in order to restore connectivity to the file server (thanks to this site for the primer.)

I had to create a file /etc/sysconfig/network-scripts/route-eth0 and give it the following line:

192.168.2.0/24 via 192.168.1.1 dev eth0

This instructed my VM to get to the 192.168.2 network via the 192.168.1.1 gateway on eth0. Restart the network service (or reboot) and success!

Dell LSI SAS2008 2TB drive fix

I just recently got a $40 external SAS adapter for my new storage server. The plan was to create a DAS device from my old NAS chassis and have it be driven by my new storage server (new to me anyway – a Dell PowerEdge R610.) I ordered what was listed simply as “Dell SAS External Dual Ports PCI-E 6GB/S Host Bus Server Adapter 12DNW 342-0910 Consumer Electronics” from Amazon for $40 to accomplish this goal.

When I plugged everything in, to my dismay none of my disks with greater than 2TB capacity showed up. Well, they sort of showed up – they all reported capacities of exactly 2TB. I was clearly running into some sort of firmware issue.

lspci revealed this card uses the LSI SAS2008 chipset, which from what I’ve read is capable of drives greater than 2TB in size. I later found the model number of my card – Dell PERC H200E – which proved to be quite vital information. After hours of digging around in unholy corners of the internet I finally arrived on this Dell Support page. It had exactly what I was hoping for:

ENHANCEMENTS:
– Added support for SAS HDDs larger than 2TB

To flash this I chose to create a bootable dos ISO as per the instructions here. First, download the Windows installer, open with your archive program of choice and extract to the folder you’re going to build your ISO from. Then follow the instructions linked to above of downloading a freeDOS ISO, extracting it to the same folder you extracted the firmware to, then running the command to build your ISO (adjust as needed)

mkisofs -o <ISO_OUTPUT_LOCATION -q -l -N -boot-info-table -iso-level 4 -no-emul-boot -b isolinux/isolinux.bin -publisher "FreeDOS - www.freedos.org" -A "FreeDOS beta9 Distribution" -V FDOS_BETA9 -v .

I got so far and yet tripped at the finish line. If you simply run flash.bat you’ll be greeted with a message saying no compatible adapters were found. Fortunately that’s a LIE. My savior was this writeup on how to flash certain versions of these cards to IT mode. I didn’t care about IT mode (my card is not a RAID card) but it had the information I needed. Here are the magic commands!

sas2flsh -listall

#Use the number in the first column to get the SAS Address for the card.
sas2flsh -c 0 -list
#Write down the SAS Address and continue to the next steps.
sas2flsh -o -f 6GBPSAS.FW
sas2flsh -o -sasadd 5xxxxxxxxxxxxxxx (replace this address with the one you wrote down in the first steps).

Reboot, and finally, after hours of banging my head on the wall… success!!!

These 4 drives were only being reported as 2TB before

I didn’t end up using it but in my internet travels I came across this. Broadcom offers a neat utility called the LSI pre-boot USB tool that I didn’t end up using: https://www.broadcom.com/support/knowledgebase/1211161499804/lsi-pre-boot-usb-tool-download

Proxmox HA management script

I was a bit frustrated at the lack of certain functions of ProxMox. I wanted an easy way to tag a VM and manage that tag as a group. My solution was to create HA groups for VMs with various functions. I can then manage the group and tell them to migrate storage or turn off & on.

I wrote a script to facilitate this. Right now it only covers powering on, powering off, and migrating the location of the primary disk, but more is to come.

Here’s what I have so far:

#!/bin/bash
#Proxmox HA management script
#Migrates storage, starts, or stops Proxmox HA groups based on the name and function passed to it.
#Usage: manage-HA-group.sh <start|stop|migrate> <ha-group-name> [local|remote]

#Change to the name of your local storage (for migrating from remote to local storage)
LOCAL_STORAGE_NAME="pve-1TB"

function get_vm_name() {
    #Determine the name of the VMID passed to this function
    VM_NAME=$(qm config "$1" | grep '^name:' | awk '{print $2}')
}

function get_group_VMIDs() {
    #Get a list of VMIDs based on the name of the HA group passed to this function
    group_VMIDs=$(ha-manager config | grep -B1 "$1" | grep vm: | sed 's/vm://g')
}

function group_power_state() {
    #Loop through members of HA group passed to this function
    for group in "$1" 
    do
        get_group_VMIDs "$group"
        for VM in $group_VMIDs
        do
            get_vm_name "$VM"
            echo "$OPERATION $VM_NAME in HA group $group"
            ha-manager set $VM --state $VM_STATE
        done
    done
}

function group_migrate() {
    #This function migrates the VM's first disk (scsi0) to the specified location (local/remote)
    #TODO String to determine all disk IDs:  qm config 115 | grep '^scsi[0-9]:' | tr -d ':' | awk '{print $1}'
    disk="scsi0"    

    #Loop through each VM in specified group name (second argument passed on CLI)
    for group in "$2" 
    do
        get_group_VMIDs "$group"
        for VM in $group_VMIDs
        do
            #Determine the names of each VM in the HA group
            get_vm_name "$VM"

            #Set storage location based on argument
            if [[ "$3" == "remote" ]]; then
                storage="$VM_NAME"
            else
                storage="$LOCAL_STORAGE_NAME"
            fi

            #Move primary disk to desired location
            echo "Migrating $VM_NAME to "$3" storage"
            qm move_disk $VM $disk $storage --delete=1

        done
    done
}

case "$1" in 
    start)
        VM_STATE="started"
        OPERATION="Starting"
        group_power_state "$2" 
        ;;
    stop)
        VM_STATE="stopped"
        OPERATION="Stopping"
        group_power_state "$2"
        ;;
    migrate)
        case "$3" in
            local|remote)
                group_migrate "$@"
                ;;
            *)
                echo "Usage: manage-HA-group.sh migrate <ha-group-name> <local|remote>"
                ;;
        esac        
    ;;
    *)
        echo "Usage: manage-HA-group.sh <start|stop|migrate> <ha-group-name> [local|remote]"
        exit 1
        ;;
esac

Java IDRAC 6 Ubuntu 18.04 setup

I recently acquired a Dell PowerEdge R610 and had a hard time getting its iDRAC to work properly on my ElementaryOS setup (Ubuntu 18.04 derivative.) I had two problems: Connection failed error and keyboard not working.

Connection Failed

After much searching I finally found this post:

The post explains the problem is with the security settings of Java 8+ preventing the connection. I didn’t know where my security file was so I first ran a quick find command to find it:

sudo find / -name java.security

In my case it was located in /etc/java-11-openjdk/security/java.security

The last step was to remove RC4 from the list of blacklisted ciphers, as this is the cause of the problem.

sudo vim /etc/java-11-openjdk/security/java.security

#change jdk.tls.disabledAlgorithms=SSLv3, RC4, DES, MD5withRSA, DH keySize < 1024, \ to be:
jdk.tls.disabledAlgorithms=SSLv3, DES, MD5withRSA, DH keySize < 1024, \

Save and exit, and iDRAC will now load!

Except now…

Keyboard doesn’t work

My system was defaulting to using JRE 11, which apparently causes the keyboard to not function at all. I found on this reddit post that you really need an older version of Java. To do so on Ubuntu 18.04 you need to install it along with the icedtea plugin and run update-alternatives

sudo apt install openjdk-8-jre icedtea-8-plugin

Edit /etc/java-8-openjdk/security/java.security and remove the restriction on the RC4 algorhythm. Then configure the system to run java 8:

sudo update-alternatives --config java
#select java 8

Lastly, configure the icedtea plugin to run Java 8 instead of 11, because for some reason this plugin ignores the system java settings. Launch the IcedTea Web Control panel (find it in your system menu) and then Navigate to JVM settings. Enter /usr/ in the section “Set JVM for IcedTea-Web – working best with OpenJDK” section. Then hit Apply / OK

Phew. FINALLY you should be able to use iDRAC 6 on your modern Ubuntu system.