Category Archives: CLI

Change ceph network

My notes on changing which network your Proxmox CEPH cluster lives in. In my case I wanted to switch from a 10 gig network to a 40gig network in a different subnet. Source: https://forum.proxmox.com/threads/ceph-changing-public-network.119116

  1. Change network configuration in “ceph.conf”
    • Be sure to edit both cluster network and public network
  2. Destroy and recreate monitors (one by one);
  3. Destroy and recreate managers (one by one, leaving the active one for last);
  4. Destroy and recreate metadata servers (one by one, leaving the active one for last;
  5. Restart OSDs (one by one – or more, depending how many OSDs you have in the cluster – so you avoid restarting the hosts);

Get CEPH running on new Proxmox node

pveceph install –repository no-subscription

Move OSDs to new host

Source: https://forum.proxmox.com/threads/move-osd-to-another-node.33965/page-2

Follow a similar procedure above of downing each OSD one by one on the old host. Remove the drives and place them in the new host. Then run the following:

pvscan
ceph-volume lvm activate --all

Troubleshooting

Unable to remove monitor with unknown status

https://forum.proxmox.com/threads/ceph-cant-remove-monitor-with-unknown-status.63613

rm -r /var/lib/ceph/mon/ceph-pve2/

Remove failed host

I had to edit /etc/pve/ceph.conf manually, remove host when it failed. It wouldn’t work in the Proxmox GUI.

Install Apache Guacamole 1.5.5 with docker-compose

I decided I needed to update my Apache Guacamole instance to their latest version – 1.5.5. Unfortunately the git repo I provided in my last article about it – https://techblog.jeppson.org/2021/03/guacamole-docker-quick-and-easy/ – doesn’t appear to work properly, even with a fresh install. So, I set about to rebuild from scratch. I found this article which helped me to do it. I updated the version from 1.4.0 to 1.5.5 and it worked beautifully.

Make guacamole directory

mkdir guacamole
cd guacamole

Pull down images

docker pull guacamole/guacamole:1.5.5
docker pull guacamole/guacd:1.5.5
docker pull mariadb:10.9.5

Grab database initialization file

docker run --rm guacamole/guacamole:1.5.5 /opt/guacamole/bin/initdb.sh --mysql > initdb.sql

Make initial docker-compose.yml file with just the database for now:

services:
  guacdb:
    container_name: guacamoledb
    image: mariadb:10.9.5
    restart: unless-stopped
    environment:
      MYSQL_ROOT_PASSWORD: 'MariaDBRootPass'
      MYSQL_DATABASE: 'guacamole_db'
      MYSQL_USER: 'guacamole_user'
      MYSQL_PASSWORD: 'MariaDBUserPass'
    volumes:
      - './db-data:/var/lib/mysql'
volumes:
  db-data:

Copy sql script into container and execute it

docker cp initdb.sql guacamoledb:/initdb.sql
sudo docker exec -it guacamoledb bash
cat /initdb.sql | mysql -u root -p guacamole_db
<insert MYSQL_ROOT_PASSWORD as defined earlier>
exit

Add the guacd & guacamole sections to your docker-compose.yml file

This is the end result:

services:
  guacdb:
    container_name: guacamoledb
    image: mariadb:10.9.5
    restart: unless-stopped
    environment:
      MYSQL_ROOT_PASSWORD: 'MariaDBRootPass'
      MYSQL_DATABASE: 'guacamole_db'
      MYSQL_USER: 'guacamole_user'
      MYSQL_PASSWORD: 'MariaDBUserPass'
    volumes:
      - './db-data:/var/lib/mysql'
  guacd:
    container_name: guacd
    image: guacamole/guacd:1.4.0
    restart: unless-stopped
  guacamole:
    container_name: guacamole
    image: guacamole/guacamole:1.4.0
    restart: unless-stopped
    ports:
      - 8080:8080
    environment:
      GUACD_HOSTNAME: "guacd"
      MYSQL_HOSTNAME: "guacdb"
      MYSQL_DATABASE: "guacamole_db"
      MYSQL_USER: "guacamole_user"
      MYSQL_PASSWORD: "MariaDBUserPass"
      TOTP_ENABLED: "true"
    depends_on:
      - guacdb
      - guacd
volumes:
  db-data:

Start docker compose stack

Finally run docker compose up -d to get everything up and running again.

Remove /guacamole in the URL

The article says guacamole must have /guacamole at the end of the URL, but that is not correct. There is an environment variable you can pass to the container to tell the context to run in root instead of the guacamole subdirectory. If this is your desire, simply add

WEBAPP_CONTEXT: "ROOT"

to the guacamole section in your docker compose file and re-run sudo docker compose up -d

Here is my final docker compose file for Guacamole 1.5.5:

services:
  guacdb:
    container_name: guacamoledb
    image: mariadb:10.9.5
    restart: unless-stopped
    environment:
      MYSQL_ROOT_PASSWORD: 'MariaDBRootPass'
      MYSQL_DATABASE: 'guacamole_db'
      MYSQL_USER: 'guacamole_user'
      MYSQL_PASSWORD: 'MariaDBUserPass'
    volumes:
      - './db-data:/var/lib/mysql'

  guacd:
    container_name: guacd
    image: guacamole/guacd:1.5.5
    restart: unless-stopped

  guacamole:
    container_name: guacamole
    image: guacamole/guacamole:1.5.5
    restart: unless-stopped
    ports:
      - 8080:8080
    environment:
      GUACD_HOSTNAME: "guacd"
      MYSQL_HOSTNAME: "guacdb"
      MYSQL_DATABASE: "guacamole_db"
      MYSQL_USER: "guacamole_user"
      MYSQL_PASSWORD: "MariaDBUserPass"
      TOTP_ENABLED: "true"
      WEBAPP_CONTEXT: "ROOT"
    depends_on:
      - guacdb
      - guacd

volumes:
  db-data:

Unbind vfio driver from device in Proxmox

I found myself with a Proxmox server that wouldn’t do anything with its network card. It took me a while to realize that at one point I had bound it to a VM. Even after removing it from the VM, the host wouldn’t do anything with it.

Discover which driver a device is using:

lspci -knn

In my case I found the culprit: the driver for the network card was still claimed by vfio-pci

08:00.0 Network controller [0280]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]
Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:0050]
Kernel driver in use: vfio-pci
Kernel modules: mlx4_core

I finally found in this post how to tell the kernel to unbind from vfio-pci and bind to the network driver mlx4_core. Given the PCI bus location and device ID from the command, I was able to reclaim my network adapter to my host successfully:

echo -n "0000:08:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind
echo -n "15b3 1003" > /sys/bus/pci/drivers/vfio-pci/remove_id
echo -n "0000:08:00.0" > /sys/bus/pci/drivers/mlx4_core/bind

Success!

Quick way to make HDD light blink

I needed a quick and dirty way to make a specific hard drive LED blink so that I could identify which drive to replace. I stumbled across this post that worked well for me.

The simple trick is to run smartctl in a while loop. Something about smartctl makes the drive light blink differently than using dd, which is what I was using previously. smartctl is what finally allowed me to identify the drive. Here is the command:

while true; do smartctl -a /dev/<device>; done

The command will run forever until you Ctrl + C. It make the LED blink rather obviously, which made things much easier.

Zimbra commercial SSL renewal procedure

My quick notes on what I have to do every year to upgrade my Zimbra mail certificate with a new Namecheap SSL certificate:

  1. Request CSR
    • /opt/zimbra/bin/zmcertmgr createcsr comm -new -subject "/C=COUNTRY/ST=STATE/L=LOCATION/O=ORG/OU=OU/CN=CN.EXAMPLE.ORG" -subjectAltNames CN.EXAMPLE.ORG
    • cat /opt/zimbra/ssl/zimbra/commercial/commercial.csr
  2. Upload CSR, verify domain, receive cert bundle
  3. Copy CRT & CA Bundle files to /tmp/cert
  4. Change permissions of files to allow zimbra user to use them:
    sudo chown zimbra /tmp/cert
    sudo chown zimbra /tmp/cert/*
  5. Verify it works against private key
    zmcertmgr verifycrt comm /opt/zimbra/ssl/zimbra/commercial/commercial.key /tmp/cert/ISSUED_CRT.crt /tmp/cert/CA_BUNDLE.ca-bundle
  6. Import new key
    zmcertmgr deploycrt comm /tmp/cert/ISSUED_CRT.crt /tmp/cert/CA_BUNDLE.ca-bundle
  7. Restart zimbra
    zmcontrol restart

Trigger button to run a script in Home Assistant

I configured a button (Runlesswire Click) to log diaper changes for my new baby. The diaper changes are logged in a Google Docs spreadsheet. I set up a simple public facing Google Form that I could run unauthenticated curl requests against. I then configured Home Assistant to run that curl command when the button is pressed. Instant diaper logging by the press of a button.

Lessons learned:

  • Zigbee Home Assistant (ZHA) does not yet support the Zigbee Green protocol, which the RunlessWire Click uses. I had to pair the switches to my Hue hub instead.
    * It looks like they’re getting close to supporting it, though: https://github.com/zigpy/zigpy/pull/1282

Here was my process:

  • Create Google Form
  • Obtain form ID from URL bar
  • Get pre-filled link to get names of fields by clicking the three dots on top right and clicking “Get pre-filled link”. Make note of the names for each entry e.g. entry.1363419348
    Thanks to help from: https://stackoverflow.com/questions/65142364/i-cant-find-name-attribute-while-inspecting-input-elements-of-google-form-ho
  • Curl command is:
    curl https://docs.google.com/forms/<FORM_URL>/formResponse -d ifq -d <ENTRY_NAME>=<ENTITY_VALUE> -d <ADDITIONAL_ENTRY_NAME>=<ADDITIONAL_ENTRY_VALUE> -d submit=Submit
    Thanks to help from: https://eureka.ykyuen.info/2014/07/30/submit-google-forms-by-curl-command/
  • Shell commands go into configuration.yaml
    shell_command:
    log_pee: <CURL_COMMAND>
    log_poo: <CURL_COMMAND>
    Thanks to help from: https://community.home-assistant.io/t/dont-understand-how-to-use-shell-commands/576580/9
  • Restart Home Assistant to pick up your configuration changes.
  • Configure the automation to call Service: shell_command

Success!

Saltstack gitfs ‘Failed to authenticate SSH session: Callback returned error’ fix

I lost several days of productivity with this one. I wanted to connect my Cent 7 salt master’s salt & pillar data to a gitfs backend. I configured /etc/salt/master per the docs but kept getting this error message:

Error occurred fetching gitfs remote 'git@github.com:<owner>/<repo>': Failed to authenticate SSH session: Callback returned error

I eventually discovered this bit of info that pointed me in the right direction that it was likely an issue with the certificate I was using. I followed the steps of generating a new certificate, but this time I received the error message “You’re using an RSA key with SHA-1, which is no longer allowed. Please use a newer client or a different key type.”

The issue stemmed from the fact that github tightened their security for SSH keys. More digging revealed that the pygit2 python module that comes with CentOS 7 is old and does not recognize the new cipher. I eventually found a fix – use pip to install a compatible version of pygit2. The latest version that works on Cent 7 is 1.6.1. Simply installing it wasn’t enough, though – you must also purge the system-installed pygit2 yum package.

Steps to fix

  1. Remove system supplied pygit2 version
    sudo yum remove python3-pygit2
  2. Install version 1.6.1 of pygit2 via pip. Sudo must be used to ensure global paths are updated.
    sudo python3 -m pip install pygit2==1.6.1 -U
  3. Restart the salt master
    sudo systemctl restart salt-master
  4. Review /var/log/salt/master for errors.

Troubleshooting

Monitor /var/log/salt/master for errors. I occasionally ran into errors such as this one:

2024-03-15 13:01:45,957 [salt.utils.gitfs :878 ][WARNING ][31763] gitfs_global_lock is enabled and update lockfile /var/cache/salt/master/gitfs/5b5f257b5dc909390cd0dfab5b6722334c9bc541912da272389f39cf5b80602e/.git/update.lk is present for gitfs remote ‘git@github.com:<owner>/<repo>’. Process 31793 obtained the lock

The solution was to remove the file and restart the salt master.

Configure Zimbra live replication

I’ve recently configured live active replication from my Zimbra e-mail server to a backup server. This is really slick – in the event of primary server failure, I can bring up my secondary in a matter of minutes with no data loss. I used the Zimbra live sync scripts on Gitlab to accomplish this.

These are my notes on things I needed to do in addition to the readme to get things to work properly on my Zimbra 8.8.15 Open Source Edition installs on CentOS 7 boxes.

Install atd (at package):
sudo yum install atd

Make sure the backup server has the same firewall rules as the primary: https://wiki.zimbra.com/wiki/Ports

On the backup server, configure DNS for the mail server to resolve to the Backup server’s IP address. hostname: mail.server.dns -> mirror mail server.

Disable DNS forwarding for primary mail server domain if configured (to ensure mail goes to backup server in the event of switchover.)

Clone over prod mail server, spin up and change network settings:

  • keep hostname (important)
  • change IP, DNS, hosts to use new IP address/network

/etc/sysconfig/network-scripts/ifcfg-eth0
/etc/hosts
/etc/resolv.conf

Ensure proper VLAN settings in backup VM (may be different than primary)

Systemd service:
add Environment=PATH=/opt/zimbra/bin:/opt/zimbra/common/lib/jvm/java/bin:/opt/zimbra/common/bin:/opt/zimbra/common/sbin:/usr/sbin:/sbin:/bin:/usr/sbin:/usr/bin
WorkingDirectory=/opt/zimbra

Remove start argument from ExecStart: ExecStart=/opt/zimbra/live_sync/live_syncd

This is the complete systemd unit for live sync:

[Unit]
Description=Zimbra live sync - to be run on the mirror server
After=network.target

[Service]
ExecStart=/opt/zimbra/live_sync/live_syncd
ExecStop=/opt/zimbra/live_sync/live_syncd kill
User=zimbra
Environment=PATH=/opt/zimbra/bin:/opt/zimbra/common/lib/jvm/java/bin:/opt/zimbra/common/bin:/opt/zimbra/common/sbin:/usr/sbin:/sbin:/bin:/usr/sbin:/usr/bin
WorkingDirectory=/opt/zimbra

[Install]
WantedBy=multi-user.target

Time limit

It looks like there’s a time limit for how long Zimbra keeps redo logs. It means you will get a lost mail situation if you try to bring your primary server back up after it’s been offline for too long (more than a few weeks.) If you’ve been failed over to your secondary mail server for more than two weeks, you’ll want to do the reverse procedure – clone the backup to the primary, edit IP addresses, then run the zimbra live sync. Log into the restored server to ensure mails from greater than 2 weeks ago are all there.

Replace unavail disk in ZFS

I had an issue where I removed a drive in my ZFS array and replaced it with a new drive which the OS gave the same device name (/dev/sdd). I had a hard time getting zfs to replace the drive until I discovered the -g flag for zpool status (thanks to this stackexchange post.)

That did the trick! Simply running zpool status -g showed the GUIDs of each device, which I could then use to properly use zpool replace on:

sudo zpool replace Poolname 12922644002107879117 /dev/sdd

Success!