Category Archives: CLI

Managing Windows hosts with Ansible

I spun my wheels for a while trying to get Ansible to manage windows hosts. Here are my notes on how I finally successfully got ansible (on a Linux host) to use an HTTPS WinRM connection to connect to a windows host using Kerberos for authentication. This article was of great help.

Ansible Hosts file

[all:vars]
ansible_user=<user>
ansible_password=<password>
ansible_connection=winrm
ansible_winrm_transport=kerberos

Packages to install (CentOS 7)

sudo yum install gcc python2-pip
sudo pip install kerberos requests_kerberos pywinrm certifi

Playbook syntax

Modules involving Windows hosts have a win_ prefix.

Troubleshooting

Code 500

WinRMTransportError: (u'http', u'Bad
HTTP response returned from server. Code 500')

I was using -m ping for testing instead of -m win_ping. Make sure you’re using win_ping and not regular ping module.

Certificate validation failed

"msg": "kerberos: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579)"

I had a self signed CA certificate on the box ansible was trying to connect to. Python doesn’t appear to trust the system’s certificate trust chain by default. Ansible has a configuration directive

ansible_winrm_ca_trust_path

but even with that pointing to my system trust it wouldn’t work. I then found this gem on the winrm page for ansible:

The CA chain can contain a single or multiple issuer certificates and each entry is contained on a new line. To then use the custom CA chain as part of the validation process, set ansible_winrm_ca_trust_path to the path of the file. If this variable is not set, the default CA chain is used instead which is located in the install path of the Python package certifi.

Challenge #1: I didn’t have certifi installed.

sudo pip install certifi

Challenge #2: I needed to know where certifi’s default trust store was located, which I discovered after reading the project github page

python
import certifi
certifi.where()

In my case the location was ‘/usr/lib/python2.7/site-packages/certifi/cacert.pem’. I then symlinked my system trust to that location (backing up existing trust first)

sudo mv /usr/lib/python2.7/site-packages/certifi/cacert.pem /usr/lib/python2.7/site-packages/certifi/cacert.pem.old
sudo ln -s /etc/pki/tls/cert.pem /usr/lib/python2.7/site-packages/certifi/cacert.pem

Et voila! No more trust issues.

Ansible Tower

Note: If you’re running Ansible Tower, you have to work with their own bundled version of python instead of the system version. For version 3.2 it was located here:

/var/lib/awx/venv/ansible/lib/python2.7/site-packages/requests/cacert.pem

I fixed it by doing this:

sudo mv /var/lib/awx/venv/ansible/lib/python2.7/site-packages/requests/cacert.pem /var/lib/awx/venv/ansible/lib/python2.7/site-packages/requests/cacert.pem.old
sudo ln -s /etc/pki/tls/cert.pem /var/lib/awx/venv/ansible/lib/python2.7/site-packages/requests/cacert.pem

This resolved the trust issues.

VGA Passthrough with Threadripper

An unfortunate bug exists for the AMD Threadripper family of GPUs which causes VGA Passthrough not to work properly. Fortunately some very clever people have implemented a workaround to allow proper VGA passthrough until a proper Linux Kernel patch can be accepted and implemented. See here for the whole story.

Right now my Thrdearipper 1950x successfully has GPU passthrough thanks to HyenaCheeseHeads “java hack” applet.  I went this route because I really didn’t want to try and recompile my ProxMox kernel to get passthrough to work. Per the description “It is a small program that runs as any user with read/write access to sysfs (this small guide assumes “root”). The program monitors any PCIe device that is connected to VFIO-PCI when the program starts, if the device disconnects due to the issues described in this post then the program tries to re-connect the device by rewriting the bridge configuration.” Instructions taken from the above Reddit post.

  • Go to https://pastebin.com/iYg3Dngs and hit “Download” (the MD5 sum is supposed to be 91914b021b890d778f4055bcc5f41002)
  • Rename the downloaded file to “ZenBridgeBaconRecovery.java” and put it in a new folder somewhere
  • Go to the folder in a terminal and type “javac ZenBridgeBaconRecovery.java”, this should take a short while and then complete with no errors. You may need to install the Java 8 JDK to get the javac command (use your distribution’s software manager)
  • In the same folder type “sudo java ZenBridgeBaconRecovery”
  • Make sure that the PCIe device that you intend to passthru is listed as monitored with a bridge
  • Now start your VM

In my case (Debian Stretch, ProxMox) I needed to install openjdk-8-jdk-headless

sudo apt install openjdk-8-jdk-headless
javac ZenBridgeBaconRecovery.java

Next I have a little script on startup to spawn this as root in a detached tmux session, so I don’t have to remember to run it (If you try to start your VM before running this, it will hose passthrough on your system until you reboot it.) Be sure to change the script to point to wherever you compiled ZenBridgeBaconRecovery

#!/bin/bash
cd /home/nicholas  #change me to suit your needs
sudo java ZenBridgeBaconRecovery

And here is the command I use to run on startup:

tmux new -d '/home/nicholas/passthrough.sh'

Again, be sure to modify the above to point to the path of wherever you saved the above script.

So far this works pretty well for me. I hate having to run a java process as sudo, but it’s better than recompiling my kernel.


Update 6/27/2018:  I’ve created a systemd service script for the ZenBaconRecovery file to run at boot. Here is my file, placed in
/etc/systemd/system/zenbridge.service:  (change your working directory to match the zenbridgebaconrecovery java file location. Don’t forget to do systemctl daemon-reload.)

[Unit] 
Description=Zen Bridge Bacon Recovery 
After=network.target 

[Service] 
Type=simple 
User=root 
WorkingDirectory=/home/nicholas 
ExecStart=/usr/bin/java ZenBridgeBaconRecovery 
Restart=on-failure # or always, on-abort, etc 

[Install] 
WantedBy=multi-user.target 
~

Update 8/18/2018 Finally solved for everyone!

Per an update on the reddit thread motherboard manufactures have finally put out BIOS updates that resolve the PCI passthrough problems. I updated my X399 Tachi to the latest version of its UEFI BIOS (3.20) and indeed PCI passthrough worked without any more wonky workarounds!

Update /etc/hosts with current IP for ProxMox

ProxMox virtual environment is a really nice package for managing KVM and container visualization. One quirk about it is you need to have an entry in /etc/hosts that points to your system’s IP address, not 127.0.0.1 or 127.0.1.1. I wrote a little script to grab the IP of your specified interface and add it to /etc/hosts automatically for you. You may download it here or see below:

#!/bin/bash
#A simple script to update /etc/hosts with your current IP address for use with ProxMox virtual environment
#Author: Nicholas Jeppson
#Date: 4/25/2018

###Edit these variables to your environment###
INTERFACE="enp4s0" #the interface that has the IP you want to update hosts for
DNS_SUFFIX=""
###End variables section###

#Variables you shouldn't have to change
IP=$(ip addr show $INTERFACE |egrep 'inet '| awk '{print $2}'| cut -d '/' -f1)
HOSTNAME=$(hostname)

#Use sed to add IP to first line in /etc/hosts
sed -i "1s/^/$IP $HOSTNAME $HOSTNAME$DNS_SUFFIX\n/" /etc/hosts

Use grep, awk, and cut to display only your IP address

I needed a quick way to determine my IP address for a script. If you run the ip addr show command it outputs a lot of information I don’t need. I settled on using grep, awk, and cut to get the information I want

ip addr show <interface name> |egrep 'inet '| awk '{print $2}'| cut -d '/' -f1

The result is a clean IP address. Beautiful. Thanks to this site for insight into how to use cut.

Windows VM with GTX 1070 GPU passthrough in ProxMox 5

I started this blog four years ago to document my highly technical adventures – mainly so I could reproduce them later. One of my first articles dealt with GPU passthrough / virtualization. It was a complicated ordeal with Xen. Now that I’ve switched to KVM (ProxMox) I thought I’d give it another go. It’s still complicated but not nearly as much this time.

To get my Nvidia GTX 1070 GPU properly passed through to a Windows VM hosted by ProxMox 5 I simply followed this excellent guide written by sshaikh. I will summarize what I took from his guide to get my setup to work.

  1. Ensure VT-d is supported and enabled in the BIOS
  2. Enable IOMMU on the host
    1. append the following to the GRUB_CMDLINE_LINUX_DEFAULT line in /etc/default/grub
      intel_iommu=on
    2. Save your changes by running
      update-grub
  3. Blacklist NVIDIA & Nouveau kernel modules so they don’t get loaded at boot
    1. echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
      echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
    2. Save your changes by running
      update-initramfs -u
  4. Add the following lines to /etc/modules
    vfio
    vfio_iommu_type1
    vfio_pci
    vfio_virqfd
  5. Determine the PCI address of your GPU
    1. Run
      lspci -v

      and look for your card. Mine was 01:00.0 & 01:00.1. You can omit the part after the decimal to include them both in one go – so in that case it would be 01:00

    2. Run lspci -n -s <PCI address> to obtain vendor IDs. Example :
      lspci -n -s 01:00
      01:00.0 0300: 10de:1b81 (rev a1)
      01:00.1 0403: 10de:10f0 (rev a1)
  6. Assign your GPU to vfio driver using the IDs obtained above. Example:
    echo "options vfio-pci ids=10de:1b81,10de:10f0" > /etc/modprobe.d/vfio.conf
  7. Reboot the host
  8. Create your Windows VM using the UEFI bios hardware option (not the deafoult seabios) but do not start it yet. Modify /etc/pve/qemu-server/<vmid>.conf and ensure the following are in the file. Create / modify existing entries as necessary.
    bios: ovmf
    machine: q35
    cpu: host,hidden=1
    numa: 1
  9. Install Windows, including VirtIO drivers. Be sure to enable Remote desktop.
  10. Pass through the GPU.
    1. Modify /etc/pve/qemu-server/<vmid>.conf and add
      hostpci0: <device address>,x-vga=on,pcie=1. Example

      hostpci0: 01:00,x-vga=on,pcie=1
  11. Profit.

Troubleshooting

Code 43

I received the dreaded code 43 error after installing CUDA drivers. The workaround was to add hidden=1 to the CPU option of the VM:

cpu: host,hidden=1

Blue screening when launching certain games

Heroes of the Storm and Starcraft II would consistently blue screen on me with the following error:

kmode_exception_not_handled

The fix as outlined here was to create /etc/modprobe.d/kvm.conf and add the parameter “options kvm ignore_msrs=1”

echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf

Update 4/9/18: Blue screening happens to Windows 10 1803 as well with the error

System Thread Exception Not Handled

The fix for this is the same – ignore_msrs=1

GPU optimization:

Give as many CPUs as the host (in my case 8) and then enable NUMA for the CPU. This appeared to make my GTX 1070 perform better in the VM – near native performance.

ZFS delete oldest n snapshots

I came across a need to trim old ZFS snapshots. These are my quick and dirty notes on how I accomplished it.

Basic syntax taken from here:

 zfs list -H -t snapshot -o name -S creation -r <dataset name> | tail -10

You can omit the -r <dataset name> if you want to query snapshots over all your datasets. Change the tail number for the desired number of oldest snapshots.

You can pass this over to actually delete snapshots using the xargs command:

zfs list -H -t snapshot -o name -S creation -r <dataset name> | tail -10 | xargs -n 1 zfs  destroy

I came across an odd error message when trying to delete some old snapshots:

Can't delete snapshot: dataset busy

I discovered here that that means the snapshots have a hold on them. I read ZFS documentation to learn how to release the holds:

zfs release -r <tag name> <snapshot name>

After massaging these commands for a bit I was able to free up some needed space by removing ancient snapshots.

Proxmox VM migration failed found stale volume copy

Recently I had a few VMs on shared storage I couldn’t live migrate. The cryptic error messages made it sound like local LVM was required, even though in the GUI all I could see was shared storage for the VM. The errors I kept getting were like this one:

volume pve/vm-103-disk-1 already exists
command 'dd 'if=/dev/pve/vm-103-disk-1' 'bs=64k'' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
ERROR: Failed to sync data - command 'set -o ...' failed: exit code 255
aborting phase 1 - cleanup resources
ERROR: found stale volume copy 'local-lvm:vm-103-disk-1' on node 'nick-desktop'
ERROR: migration aborted (duration 00:00:01): Failed to sync data - command 'set -o pipefail ...' failed: exit code 255
TASK ERROR: migration aborted

After a ton of digging I found this forum post that had the solution:

Most likely there is some stale disk somewhere. Try to run:
# qm rescan –vmid 101

That indeed was the problem. I ran

qm rescan –vmid 103

on the node in question, then refreshed the management page. After doing that, a ‘phantom’ disk entry showed up for the VM. I deleted it, but then had to run another qm –rescan –vmid103 before it would migrate.

So to recap, run qm rescan –vmid (vmid#) once, then delete the stale disk that shows up, then run that same command again.

Fix wordpress PHP change was reverted error

Since WordPress 4.9 I’ve had a peculiar issue when trying to edit theme files using the web GUI. Whenever I tried to save changes I would get this error message:

Unable to communicate back with site to check for fatal errors, so the PHP change was reverted. You will need to upload your PHP file change by some other means, such as by using SFTP.

After following this long thread I saw the suggestion to install and use the Health Check plugin to get more information into why this is happening. In my case I kept getting this error message:

The loopback request to your site failed, this may prevent WP_Cron from working, along with theme and plugin editors.<br>Error encountered: (0) cURL error 28: Connection timed out after 10001 milliseconds

I researched what a loopback request is in this case. It’s the webserver reaching out to its own site’s url to talk to itself. My webserver was being denied internet access, which included its own URL, so it couldn’t complete the loopback request.

One solution, mentioned here, is to edit the hosts file on your webserver to point to 127.0.0.1 for the URL of your site. My solution was to open up the firewall to allow my server to connect to its URL. I then ran into a different problem:

The loopback request to your site failed, this may prevent WP_Cron from working, along with theme and plugin editors.<br>Error encountered: (0) cURL error 60: Peer's Certificate issuer is not recognized.

After digging for a while I found this site which explains how to edit php.ini to point to an acceptable certificate list. To fix this on my Cent7 machine I edited /etc/php.ini and added this line (you could also add it to /etc/php.d/curl.ini)

curl.cainfo="/etc/pki/tls/cert.pem"

This caused php’s curl module to use the same certificate trust store that the underlying OS uses.

Then restart php-fpm if you’re using it:

sudo systemctl restart php-fpm

Success! Loopback connections now work properly.


Update 7/16/2018: I still had a wordpress site that was giving me certificate grief despite the above fix. After MUCH frustration I finally found this post where André Gayle points out that wordpress ships with its own certificate bundle, independent of even curl’s ca bundle! It’s located in your wordpress directory/wp-includes/certificates folder.

My solution to this extremely frustrating problem was to remove their bundle and symlink to my own (Cent 7 box – adjust your path to match where your wordpress install and certificate trust store is located)

sudo mv /var/www/html/wordpress/wp-includes/certificates/ca-bundle.crt /var/www/html/wordpress/wp-includes/certificates/ca-bundle.crt.old
sudo ln -s /etc/pki/tls/cert.pem /var/www/html/wordpress/wp-includes/certificates/ca-bundle.crt

FINALLY no more loopback errors in the Health Check plugin, and thus the ability to edit theme files in the editor.

Site to Site VPN between OPNsense & OpenWRT with Tinc

I’m a real glutton for punishment. I decided to upgrade my parents’ router to OpenWRT. The upgrade went smoothly except for one thing: The VPN I had established between my firewall and theirs.

This was a big enough headache that I even ended up switching my firewall from pfSense to OPNsense (Something I had been contemplating doing for a while anyway) hoping it would make things easier. It didn’t. In the end I abandoned OpenVPN entirely and instead went with Tinc.

Tinc is cool because it’s full mesh peer-to-peer instead of the traditional client / server model. If your equipment supports it, I’d definitely choose it over OpenVPN, especially if multiple sites are involved. A basic rundown of its configuration can be found here.

I used this site as a reference for how to set up tinc.  Essentially you decide on a network name, create private & public keys for each host, and configure each host to connect to each other via a config file & folder structure.

Tinc general configuration

On each device create an /etc/tinc/<network name>/hosts directory structure

mkdir -p /etc/tinc/<network name>/hosts
tincd -n <network name> -K 4096

To configure TINC we need some additional configuration files inside the /etc/tinc/<network name> directory

  • tinc.conf
  • tinc-up (script for bringing up the interface)
  • hosts/<hostname> (one for each location)

tinc.conf can be as simple as this:

Name = <name of host>
ConnectTo = <name of other host> 
#Add each host with an additional ConnectTo line

There needs to be a corresponding file in the hosts directory for each host. Example host file:

Address = <External IP of host>
Subnet = <Subnet other host will share>
#Add more subnets with additional Subnet lines

The host file also need’s the host’s public key. Append it to the end of the file:

cat /etc/tinc/<network name>/rsa_key.pub >> /etc/tinc/<network name>/hosts/<host name>

It’s easiest to generate the host files on each respective host, then copy them to all the other hosts.

The last step is to create the tinc-up script

#!/bin/sh
ubus -t 15 wait_for network.interface.$INTERFACE
ip link set $INTERFACE up ip addr add 172.16.0.1/24 dev $INTERFACE

Modify the IP used on each host so they don’t overlap. The private network here is what’s used for inter-host communication.

Make the script executable:

chmod 755 /etc/tinc/<network name>/tinc-up

OPNSense specific configuration

I got this working through an enhanced tinc package for OPNsense located here.  I will copypasta the content from that site here for easier reference:

Installation

The version might change, adjust it if fetch fails

fetch https://raw.githubusercontent.com/EugenMayer/tinc-opnsense/master/dist/os-tincdcustom-latest.txz
pkg install os-tincdcustom-latest.txz

1. your network

  1. copy the /usr/local/etc/tinc/example folder to /usr/local/etc/tinc/yournetwork
  2. enter yournetwork into /usr/local/etc/tinc/nets.boot to let this network be started on boot
  3. create keypairs by runng tincd -n <yournetwork> -K

2. your network configuration and tun device

  1. Edit /usr/local/etc/tinc/yournetwork/tinc.conf set the server you want to connect to and how this server is to be named
  2. Edit /usr/local/etc/tinc/yournetwork/tinc-up and adjust the network/netbitmask

3. finally the host configuration

  1. enter the /usr/local/etc/tinc/yournetwork/hosts folder and rename the files according to what you have chosen for youservername and theotherservername – they must match!
  2. enter the public key of the “this server” you find under /usr/local/etc/tinc/yournetwork/ into the according thisservernamefile and adjust the subnet this server offers (or subnets)
  3. enter the public key of the “other server” into the according theotherservername file and adjust the subnet the other server offers (or subnets)

4. OPNsense Interface/Gateway/Route/FW configuration

Please see this answer for a brief description

  • You need to create a Gateway, which is configured to go through tinc0 with “dynamic” (do not enter an IP on Gateway field)
  • You need to add a route to <remote subnet> through this gateway
  • Add your tinc0 interface in the Interface section. You can configure a ipv4 address or you don’t, does not matter. If you do, use your tinc-up configured address. Doing this enabled you to create FW Rules for the Tinc interface – which we will need.
  • Add a firewall on the Tinc interface to allow communication to local & remote subnets
    • Alternatively, add a single rule for the Tinc interface to allow any/any access (lazy, less secure)
  • Don’t forget to create a firewall rule allowing the port you’ve configured tinc to run on access from the internet.

OpenWRT specific configuration

Openwrt follows the general tinc configuration exactly. Make the appropriate folders and config files in /etc/tinc/<network name>/ and then test your configuration:

tincd -n <network name>

Once connection is established and working:

Create interface for your VPN (network / interfaces / add new interface) Select the name of your tinc network name from the list.

Next bridge your VPN to the LAN by going to Network / Firewall and editing your LAN zone. Select your VPN interface created earlier from the list and hit save & apply.

Run on startup

I could not find clear documentation on getting this to work on startup. There is a startup script for tinc but it doesn’t appear to launch my tinc config. I ended up modifying /etc/init.d/tinc and adding these lines to the start() and stop() functions. You could also just write your own simple init script to accomplish this.

start() {
...
/usr/sbin/tincd -n <network name>
}

stop() {
...
kill `pidof tincd`
}

Troubleshooting

Tincdcustom service won’t start in OPNSense

Starting from the GUI just does nothing, starting from CLI reveals this unhelpful error:

configctl tincdcustom status

Error (1)

From the OPNSense docs I determined which command I can run to see exactly why. The command is located in this configd configuration file: /usr/local/opnsense/service/conf/actions.d/actions_tincdcustom.conf

command:/usr/local/etc/rc.d/tincdcustom start

Doing that command manually revealed what the problem was:

/usr/local/etc/rc.d/tincdcustom start

Please create /usr/local/etc/tinc/nets.boot.

I had skipped step 1.2 of the tincdcustom instalaltion guide:

enter yournetwork into /usr/local/etc/tinc/nets.boot to let this network be started on boot

Once I added a single word – the name of the network I want to start on bootup – to /usr/local/etc/tinc/nets.boot – the daemon started and worked properly.

Running Tinc in verbose mode

Coming from the tinc documentation, I ran tinc in verbose mode on both of my hosts to troubleshoot why a connection wasn’t happening. It was very helpful.

tincd -n netname -d5 -D

Edit 9/9/2018

I had issues with my tinc startup script not working on the openwrt side. I found here that stated you should add this line to the top of your tinc-up config:

ubus -t 15 wait_for network.interface.$INTERFACE

This solved the startup issue for me.