Tag Archives: SSH

Saltstack gitfs ‘Failed to authenticate SSH session: Callback returned error’ fix

I lost several days of productivity with this one. I wanted to connect my Cent 7 salt master’s salt & pillar data to a gitfs backend. I configured /etc/salt/master per the docs but kept getting this error message:

Error occurred fetching gitfs remote 'git@github.com:<owner>/<repo>': Failed to authenticate SSH session: Callback returned error

I eventually discovered this bit of info that pointed me in the right direction that it was likely an issue with the certificate I was using. I followed the steps of generating a new certificate, but this time I received the error message “You’re using an RSA key with SHA-1, which is no longer allowed. Please use a newer client or a different key type.”

The issue stemmed from the fact that github tightened their security for SSH keys. More digging revealed that the pygit2 python module that comes with CentOS 7 is old and does not recognize the new cipher. I eventually found a fix – use pip to install a compatible version of pygit2. The latest version that works on Cent 7 is 1.6.1. Simply installing it wasn’t enough, though – you must also purge the system-installed pygit2 yum package.

Steps to fix

  1. Remove system supplied pygit2 version
    sudo yum remove python3-pygit2
  2. Install version 1.6.1 of pygit2 via pip. Sudo must be used to ensure global paths are updated.
    sudo python3 -m pip install pygit2==1.6.1 -U
  3. Restart the salt master
    sudo systemctl restart salt-master
  4. Review /var/log/salt/master for errors.

Troubleshooting

Monitor /var/log/salt/master for errors. I occasionally ran into errors such as this one:

2024-03-15 13:01:45,957 [salt.utils.gitfs :878 ][WARNING ][31763] gitfs_global_lock is enabled and update lockfile /var/cache/salt/master/gitfs/5b5f257b5dc909390cd0dfab5b6722334c9bc541912da272389f39cf5b80602e/.git/update.lk is present for gitfs remote ‘git@github.com:<owner>/<repo>’. Process 31793 obtained the lock

The solution was to remove the file and restart the salt master.

Restore files from remote borg repository disk image

My off-site backup involves sending borgbackup archives of VM images to a remote synology server. I recently needed to restore a single file from one of the VM images stored within this borg backup repository on the remote server. My connection to this server is not very fast so I didn’t want to wait to download the entire image file to mount it locally.

My solution was to mount the remote borgbackup repository on my local machine over SSH so I could poke around for and copy the specific file I wanted. This requires the borgbackup binary to be present on the remote machine. Since it’s a synology, I simply copied the standalone binary over.

The restore process was complicated by the fact that the VM disk image is owned by root, so in order to access the file I needed to mount the remote repository as root.

This is the process:

  1. Set BORG_REMOTE_PATH
    1. export BORG_REMOTE_PATH=<PATH_TO_BORG_BINARY_ON_REMOTE_SYSTEM>
  2. (Arch Linux): install python-llfuse
  3. Mount repository over SSH:
    1. borg mount <USER>@<REMOTE_SYSTEM>:<PATH_TO_REMOTE_BORGBACKUP_REPOSITORY>::<BACKUP_NAME> <MOUNT_FOLDER>
  4. Follow disk image mounting process
    1. losetup -Pr -f <PATH_TO_MOUNTED_BORGBACKUP>/<FILENAME_OF_VM_IMAGE>
    2. mount -o ro /dev/loop0p2 /mnt/loop0/
  5. Follow reverse to unmount when done:
    1. umount /mnt/loop0
    2. losetup -d /dev/loop0
    3. borg umount <MOUNT_FOLDER>

Success! I was able to restore an individual file within a raw VM image backup on a remote Borgbackup repository using this method.

proxmox suspend & resume scripts

Update 12/17/2019: Added logic to wait for VM to be suspended before suspending the shypervisor

Update 12/8/2019: After switching VMs I needed to tweak the pair of scripts. I modified it to make all the magic happen on the hypervisor; the VM simply needs to SSH into the hypervisor and call the script. The hypervisor now also needs access to SSH via public key to the VM to tell it to suspend.

#!/bin/sh
#ProxMox suspend script part 1 of 2
#To be run on the VM 
#All this does is call the suspend script on the hypervisor
#This could also just be a bash alias

####### Variables #########
HYPERVISOR=        #Name / IP of the hypervisor
SSH_USER=          #User to SSH into hypervisor as
HYPERVISOR_SCRIPT= #Path to part 2 of the script on the hypervisor

####### End Variables ######

#Execute server suspend script
ssh $SSH_USER@$HYPERVISOR "$HYPERVISOR_SCRIPT" &
#!/bin/bash
#ProxMox suspend script part 2 of 2
#Script to run on the hypervisor, it waits for VM to suspend and then suspends itself
#It relies on passwordless sudo configured on the VM as well as SSH keys to allow passwordless SSH access to the VM from the hypervisor
#It resumes the VM after it resumes itself
#Called from the VM

########### Variables ###############

VM=             #Name/IP of VM to SSH into
VM_SSH_USER=    #User to ssh into the vm with
VMID=           #VMID of VM you wish to suspend

########### End Variables############

#Tell guest VM to suspend
ssh $VM_SSH_USER@$VM "sudo systemctl suspend"

#Wait until guest VM is suspended, wait 5 seconds between attempts
while [ "$(qm status $VMID)" != "status: suspended" ]
do 
    echo "Waiting for VM to suspend"
    sleep 5 
done

#Suspend hypervisor
systemctl suspend

#Resume after shutdown
qm resume $VMID

I have a desktop running ProxMox. My GUI is handled via a virtual machine with physical hardware passed through it. The challenge with this setup is getting suspend & resume to work properly. I got it to work by suspending the VM first, then the host; on resume, I power up the host first, then resume the VM. Doing anything else would cause hardware passthrough problems that would force me to reboot the VM.

I automated the suspend process by using two scripts: one for the VM, and one for the hypervisor. The first script is run on the VM. It makes an SSH command to the hypervisor (thanks to this post) to instruct it to run the second half of the script; then initiates a suspend of the VM.

The second half of the script waits a few seconds to allow the VM to suspend itself, then instructs the hypervisor to also go into suspend. I had to split these into two scripts because once the VM is suspended, it can’t issue any more commands. Suspending the hypervisor must happen after the VM itself is suspended.

Here is script #1 (to be run on the VM) It assumes you have already set up a private/public key pair to allow for passwordless login into the hypervisor from the VM.

#!/bin/sh
#ProxMox suspend script part 1 of 2
#Tto be run on the VM so it suspends before the hypervisor does

####### Variables #########
HYPERVISOR=HYPERVISOR_NAME_OR_IP
SSH_USER=SSH_USER_ON_HYPERVISOR
HYPERVISOR_SCRIPT_LOCATION=NAME_AND_LOCATION_OF_PART2_OF_SCRIPT

####### End Variables ######

#Execute server suspend script, then suspend VM
ssh $SSH_USER@$HYPERVISOR  $HYPERVISOR_SCRIPT_LOCATION &

#Suspend
systemctl suspend

Here is script #2 (which script #1 calls), to be run on the hypervisor

#!/bin/bash
#ProxMox suspend script part 2 of 2
#Script to run on the hypervisor, it waits for VM to suspend and then suspends itself
#It resumes the VM after it resumes itself

########### Variables ###############

#Specify VMid you wish to suspend
VMID=VMID_OF_VM_YOU_WANT_TO_SUSPEND

########### End Variables############

#Wait 5 seconds before doing anything to allow for VM to suspend
sleep 5

#Suspend hypervisor
systemctl suspend

#Resume after shutdown
qm resume $VMID

It works on my machine 🙂

Transfer linode VM over ssh

I love Linode for their straightforward pricing. I can use them for temporary infrastructure and not have to worry about getting overcharged. When it comes time to transfer infrastructure back, the process is fairly straightforward. In my case I wanted to keep a disk image of my Linode VM for future use.

The linode documentation is very good. I used their copy an image over ssh article combined with their rescue and rebuild article sprinkled with a bit of gzip compression and use of pv to grab my linode image locally, complete with a progress bar.

First, boot your linode into recovery mode via dashboard / Linodes / <name of your linode>, then click on Rescue tab, map your drives as needed.

Launch console (top right) to get into the recovery shell. In my case I wanted to SSH into my linode to grab the image, so I set a password and started the ssh service:

passwd
/etc/init.d/ssh start

Then on your end, pipe ssh , gzip, pv and dd together to grab the compressed disk image with progress monitoring:

ssh root@ "dd if=/dev/sda | gzip -1 -" | pv | dd of=linode-image.gz

Success.

Setup remote git repository with SSH & GIT

I wanted to set up a simple git repository to synchronize my bash scripts between a couple hosts, no fancy github or gitlab software required. These are my notes on how I got it working. Thanks to this site for the information.

On the remote host (server)

mkdir GIT_PROJECT_DIR.git
cd GIT_PROJECT_DIR.git
git init --bare

On the local hosts (client)

Create a git repository and add files to it:

cd GIT_FOLDER
git init
git add *
git commit -m "Initial commit"
git remote add origin USER@REMOTE_HOST:GIT_PROJECT_DIR.git
git push origin master
git branch --set-upstream-to=origin/master

Accept multiple SSH RSA keys with ssh-keyscan

I came across a new machine that needed to connect to many SSH hosts via ansible. I had a problem where ansible was prompting me for each post if I wanted to accept the RSA key. As I had dozens of hosts I didn’t want to type yes for every single one; furthermore the yes command didn’t appear to work. I needed a way to automatically accept all SSH RSA keys from a list of server names. I know you can disable RSA key checking but I didn’t want to do that.

I eventually found this site which suggested a small for loop, which did the trick beautifully. I modified it to suit my needs.

This little two-liner takes a file (in my case, my ansible hosts file) and then runs ssh-keyscan against it and adds the results to the .ssh/known_hosts file. The end result is an automated way to accept many SSH keys.

SERVER_LIST=$(cat /etc/ansible/hosts)
for host in $SERVER_LIST; do ssh-keyscan -H $host >> ~/.ssh/known_hosts; done

Migrate from Xenserver to Proxmox

I was dismayed to see Citrix’s recent announcement about Xenserver 7.3 removing several key features from the free version. Xenserver’s free features are the reason I switched over to them in the first place back in 2014. Xenserver has been rock solid; I haven’t had any complaints until now. Their removal of xenmotion and migration in the free version forced me to look elsewhere for my virtualization needs.

I’ve settled on ProxMox, which is KVM based. Their documentation is excellent and it has all the features I need – for free. I’m also in love with their web based management – no more Windows fat client!

Below are my notes on how I successfully migrated all my Xenserver VMs over to the ProxMox Virtual Environment (PVE).

  • Any changes to network interfaces, such as bringing them up, require a reboot of the host
  • If you have an existing ISO share, you can create a directory called  “template” in your ISO repository folder, then inside symlink “iso” back to your ISO folder. Proxmox looks inside template/iso for ISO images for whatever storage you configure.
  • Do not create your ProxMox host with ZFS unless you have tons of RAM. If you don’t have enough RAM you will run into huge CPU load times making the system unresponsive in cases of high disk load, such as VM copies / backups. More reading here.

Cluster of two:

ProxMox’s clustering is a bit different – better, in my opinion. No more master, slave dynamic – ever node is a master. Important reading: https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster

If you have two node cluster, like I do, it creates some problems, though. If one goes down, the other can’t do anything to the pool (create VM, backup) until it comes back up. In my situation I have one primary host that is up all the time and I bring the secondary host up only when I want to do maintenance on the first.

In that specific situation you can still designate a “master” of sorts by increasing the number of quorum votes it gets from 1 to 2.  That way when the secondary node is down, the primary node can still do cluster operations because the default number of votes to stay quorate is 2. See here for more reading on the subject.

On either host (they must both be up and in the cluster for this to work)

vi /etc/pve/corosync.conf

Find your primary server in the nodelist settings and change

quorum_votes: 2

Also find the quorum section and add expected_votes: 2

Make sure to increment config_version number (bottom of the file.) Now if your secondary is down you can still operate the primary.

Migrating VMs

I migrated my Xen VMs to KVM by creating VMs with identical specs in PVE, copying the VHD files from the Xen host to the new PVE host, running qemu-img to convert them to RAW format, and then using dd to copy the raw information over to corresponding empty VM  disks. Depending on the OS of the VM there was some after-copy tweaking I also had to do.

From shared storage

Grab the VHD file (quiesce any snapshots away first) of each xen VM and convert them to raw format

qemu-img convert <VHD_FILE_NAME>.vhd -O raw <RAW_FILE_NAME>.raw

Create a new VM with identical configuration, especially disk size. Go to the hardware tab and take note of the name of the disk. For example, one of mine was:

local-zfs:vm-100-disk-1,discard=on,size=40G

The interesting part is between local-zfs and discard=on, namely vm-100-disk-1. This is the name of the disk we want to overwrite with data from our Xenserver VM’s disk.

Next figure out the full path of this disk on your proxmox host

find / -name vm-100-disk-1*

The result in my case was /dev/zvol/rpool/data/vm-100-disk-1

Take the name and put it in the following command to complete the process:

dd if=<RAW_FILE_NAME>.raw of=/dev/zvol/rpool/data/vm-100-disk-1 bs=16M

Once that’s done you can delete your .vhd and .raw files.

From local / LVM storage

In case your Xen VMs are stored in LVM device format instead of a VHD file, get UUID of storage by doing xe vdi-list and finding the name of the hard disk from the VM you want. It’s helpful to rename the hard disks to something easy to spot. I chose the word migrate.

xe vdi-list|grep -B3 migrate
uuid ( RO) : a466ae1b-80c7-4ef2-91a3-5c1ba1f6fc2f
 name-label ( RW):  migrate

Once you have the UUID of the drive, you can use lvscan to find the full LVM device path of that disk:

lvscan|grep a466ae1b-80c7-4ef2-91a3-5c1ba1f6fc2f
 inactive '/dev/VG_XenStorage-1ada0a08-7e6d-a5b6-d0b4-515e251c0c75/VHD-a466ae1b-80c7-4ef2-91a3-5c1ba1f6fc2f' [10.03 GiB] inherit

Shut down the corresponding VM and reactivate its logical volume (xen deactivates LVMs if the VM is shut off:

lvchange -ay <full /dev/VG_XenStorage path discovered above>

Now that we have the full LVM path and the volume is active, we can use dd over SSH to transfer the image to our proxmox server:

sudo dd if=<full /dev/VG/Xenstorage path discovered above> | ssh <IP_OF_PROXMOX_SERVER> dd of=<LOCATION_ON_PROXMOX_THAT_HAS_ENOUGH_SPACE>/<NAME_OF_VDI_FILE>.vhd

then follow vhd -> raw -> dd to proxmox drive process described in the From Shared Storage section.

Post-Migration tweaks

For the most part Debian-based systems moved over perfectly without any needed tweaks; Some VMs changed interface names due to network device changes. eth0 turned into ens8. I had to modify /etc/network/interfaces to change eth0 to ens8 to get virtio networking working.

CentOS

All my CentOS VMs failed to boot after migration due to a lack of virtio disk drivers in the initial RAM disk. The fix is to change the disk hardware to IDE mode (they boot fine this way) and then modify the initrd of each affected host:

sudo dracut --add-drivers "virtio_pci virtio_blk virtio_scsi virtio_net virtio_ring virtio" -f -v /boot/initramfs-`uname -r`.img `uname -r`
sudo sh -c "echo 'add_drivers+=\" virtio_pci virtio_blk virtio_scsi virtio_net virtio_ring virtio \"' >> /etc/dracut.conf"
sudo shutdown -h now

Once that’s done you can detach the hard disk and re-attach it back as SCSI (virtio) mode. Don’t forget to modify the options and change the boot order from ide0 to scsi0

Arch Linux

One of my Arch VMs had UUID configured which complicated things. The root device UUID changes in KVM virtio vs IDE mode. The easiest way to fix it is to boot this VM into an Arch install CD. Mount the root partition and then run arch-chroot /mnt/sda1. Once in the chroot runpacman -Sy kernel to reinstall the kernel and generate appropriate kernel modules.

mount /dev/sda1 /mnt
arch-chroot /mnt
pacman -Sy kernel

Also make sure to modify /etc/fstab to reflect appropriate device id or UUID (xen used /dev/xvda1, kvm /dev/sda1)

Windows

Create your Windows VM using non-virtio drivers (default settings in PVE.) Obtain the latest windows virtio drivers here and extract them somewhere memorable. Switch everything but the disk over to Virtio in the VM’s hardware config and reboot the VM. Go into device manager and point to extracted driver location for each unknown device.

To get Virtio disk to work, add a new disk to the VM of any size and SCSI (virtio) type. Boot the Windows VM and install drivers for that drive. Then shut down, remove that second drive, detach the primary drive and change to virtio SCSI. It should then come up with full virtio drivers.

All hosts

KVM has a guest agent like xenserver does called qemu-agent. Turn it on in VM options and install qemu-guest-agent in your guest. This KVM a bit more insight into your host.

Determine which VMs need guest agent installed:

qm agent $id ping

If nothing is returned, it means qemu-agent is working. You can test all your VMs at once with this one-liner (change your starting and finishing VM IDs as appropriate)

for id in {100..114}; do echo $id; qm agent $id ping; done

This little one-liner will output the VM ID it’s trying to ping and will return any errors it finds. No errors means everything is working.

Disable support nag

PVE has a support model and will nag you at each login. If you don’t like this you can change it like so (the line number might be different depending on which version you’re running:

vi +850 /usr/share/pve-manager/js/pvemanagerlib.js

Modify the line if (data.status !== ‘Active’); change it to

if (false)

Troubleshooting

Remove a failed node

See here: https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster#Remove_a_cluster_node

systemctl stop pvestatd.service
systemctl stop pvedaemon.service
systemctl stop pve-cluster.service
rm -r /etc/corosync/*
rm -r /var/lib/pve-cluster/*
reboot

Quorum never establishes / takes forever

I had a really strange issue where I was able to establish quorum with a second node, but after a reboot quorum never happened again. I re-installed that second node and re-joined it several times but I never got past the “waiting for quorum….” stage.

After much research I came across this article which explained what was happening. Corosync uses multicast to establish cluster quorum. Many switches (including mine) have a feature called IGMP snooping, which, without an IGMP querier, essentially means multicast never happens. Sure enough, after logging into my switches and disabling IGMP snooping, quorum was instantly established. The article above says this is not recommended, but in my small home lab it hasn’t produced any ill effects. Your mileage may vary. You can also configure your cluster to use unicast instead.

USB Passthrough not working properly

With Xenserver I was able to pass through the USB controller of my host to the guest (a JMICRON USB to ATAATAPI bridge holding a 4 disk bay.) I ran into issues with PVE, though. Using the GUI to pass the USB device did not work. Manually adding PCI passthrough directives (hostpci0: 00:14.1) didn’t work. I finally found on a little nugget on the PCI Passthrough page about how you can simply pass the entire device and not the function like I had in Xenserver. So instead of doing hostpci0: 00:14.1, I simply did hostpci0: 00:14 . That  helped a little bit, but I was still unable to fully use these drives simultaneously.

My solution was eventually to abandon PCI passthrough altogether in favor of just passing individual disks to the guest as outlined here.

Find the ID of the desired disks by issuing ls -l /dev/disk/by-id. You only need to know the UUIDs of the disks, not the partitions. Then modify the KVM config of your desired host (mine was located at /etc/pve/qemu-server/101.conf) and a new line for each disk, adjusting scsi device numbers and UUIDs to match:

scsi5: /dev/disk/by-id/scsi-SATA_ST5000VN000-1H4_Z111111

With that direct disk access everything is working splendidly in my FreeNAS VM.

Persistent SSH tunnel for Windows

Over the years I’ve needed to access family members’ machines for remote support. The problem with parents and grandparents is walking them through certain prompts for services like join.me is quite problematic. To that end I’ve devised an open source way for me to automatically remote into their machine regardless of firewalls or machine location. This is possible thanks to cygwin, autoSSH, and NSSM. As long as the machine has internet access, I can get to it.

To pull this off you’ll need to install a few cygwin packages, copy over a private key file, create a batch script, and invoke NSSM to create a service to invoke the batch script on startup.

Cygwin

Obtain cygwin from here. You’ll need to use the graphical installer for the initial setup. Install the following packages:

  • ssh
  • autossh
  • wget (not necessary, but handy to have)

If cygwin is already installed, install it again. I wasted an hour once trying to figure out why it wasn’t working when the culprit turned out to be a buggy old version of cygwin itself.

Private key

For this to work you’ll need an SSH server configured for key authentication (no password.) On your SSH server:

  • Create new user for the Windows machine
  • Execute ssh-keygen as that user
  • Copy the contents of the .pub file into ~/.ssh/authorized_keys
  • Copy the private key (the one with no extension) to the Windows computer
  • Make sure permissions for the .ssh folder and everything inside of it is 600

GatewayPorts

One option that I really enjoy on my SSH server is the GatewayPorts option. This turns your SSH server into a gateway for any port forwards. Simply edit /etc/ssh/sshd_config and add

GatewayPorts yes

Save the file and restart the SSH service. Now if you create SSH tunnels your SSH server opens those ports for you to connect from other machines.

Create batch file

On the windows machine a simple command gets us up and running. Create a one-liner .cmd file on the Windows machine in a location of your choosing with the following:

c:\cygwin\bin\autossh.exe -M <random_port_number> -i <keyfile location>  -l <user> -R<remote_port:localhost:<local_port> -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null <remote address>

Update it to reflect the path of your cygwin installation if you installed somewhere other than the default location.

I add the reverse port forward option ( -R ) so that I can simply connect to my ssh server on the specified port and the connection will tunnel through to the Windows computer. In my case, I do -R5700:localhost:5900 which instructs my ssh server to listen on port 5700, then forward that connection to the Windows machine on port 5900 for VNC.

Create service

The Non-suciking service manager is a nifty little program that lets us turn anything into a windows service. Once it’s a service it can be started automatically on startup, even if nobody has logged in yet.

Obtain NSSM from here and extract it to a location you can remember. Then, open an administrator command prompt, cd to the directory containing nssm.exe, and enter the following:

nssm.exe install autossh

A GUI will open up. Specify the location of your batch file in the Path: section, then click Install service.

Once this is done, start the service by running services.msc, looking for your service, right click and select start. Make sure the startup type is set to automatic.

That’s it! If your keys are in the right place and the permissions are correct, the computer will automatically (and silently) log into your SSH server and create a tunnel for you. Autossh will continually try to re-connect in the event of connection loss. Awesome.

Reverse SSH

You can also configure cygwin to be an SSH server for your windows host. This will allow you to SSH into the machine if you specify -R<random_port:localhost:22 in your batch file. Here are a few notes for getting ssh working

  • Open up a cygwin terminal and execute the command:
    ssh-host-config
  • Once the SSH server is configured, tweak the SSH configuration to allow logging in with blank passwords (many of my family do not use a password to log into the machine.) Simply un-comment the line “PermitEmptyPasswords no” and change no to yes. Then, restart the ssh service. (thanks to this blog for the insight)

Measure SSH transfer speeds

SSH is a beautiful thing. In addition to remotely administering machines you can use it to transfer files. To do this one simply pipes the cat command on both ends. For example, to copy hello.txt on the source host to hi.txt on the destination host, the command would be:

ssh remote_host cat hello.txt | cat > hi.txt

The command takes the contents of hello.txt and pipes it over to the remote host. The cat command on the remote host takes what was piped to it as  input and the > sign instructs cat to take its input and output it to hi.txt.

A great way to measure transfer speeds using ssh between two hosts is to take /dev/zero on the source host and output it to /dev/null on the destination host. This bypasses any disk speed bottlenecks and only measures network throughput. Combine this with the pv command to get a nice graphical view of how fast the transfer is going.

ssh remote_host cat /dev/zero | pv | cat > /dev/null

The default options between my machines result in about a 65 megabytes a second transfer speed.

1

It turns out that the encryption cipher used makes a big difference on transfer speeds. Use the -c command to specify which cipher to use and see how much of a difference this makes. -o compression=no can also help with transfer speeds.

The fastest cipher I’ve found is arcfour. It’s touted as less secure, but for my local network I can accept the risk (thanks to slashdot for the discussion.)

ssh -c arcfour -o Compression=no remote_host cat /dev/zero | pv | cat > /dev/null

2

Using acrfour more than doubles the speed for me! Amazing.