Category Archives: Hardware

Multi click workaround for single click buttons in Home Assistant

I recently bought a Leviton Decora Smart Controller Switch with the hopes that I could put it in my bathroom for quick and easy music selection. I was dismayed to find out that the device doesn’t support multi-click – it registers single click only. Thanks to Home Assistant, I was able to bypass this problem. It took a bit of doing but thanks to this forum post I was able to get it working.

All I wanted this switch to do was play music based on which button was tapped and how many times. The switch was designed to also control a light/fan circuit, but I didn’t need that functionality. I wired it into the wall with line, neutral, and ground, but didn’t put anything on the load wire. This gave me the freedom to use all four buttons without messing anything else up. I wanted local control, so I set this device up via HomeKit.

To accomplish multi-click functionality, I wrote an automation that does the following:

  • Trigger on click of the button
  • Repeat the trigger five times – one for each trigger plus an extra (not sure why, but needed to be 5 instead of 4)
  • Wait up to half a second for another trigger, caused by a subsequent press of the button. If there is no subsequent press, then do the first action. If there IS a subsequent press, do the second action, and so on through the 4th action.
  • Stop the automation if a timeout has been reached (station selected.)
  • Run in single mode – don’t run subsequent automations if one is already running.
  • The action is simply changing an option on an input_select helper. I have a separate automation to initiate the speaker to play the desired station based on the value of the input_select helper.

Here is the full automation:


alias: Master Bath scene controller Button 1 multi click
description: ""
triggers:
  - device_id: 8dad5dce6044df4d28b566a412379bdf
    domain: homekit_controller
    type: button1
    subtype: single_press
    trigger: device
    enabled: true
conditions: []
actions:
  - repeat:
      count: 5
      sequence:
        - wait_for_trigger:
            - entity_id:
                - >-              event.master_bath_scene_controller_stateless_programmable_switch_1
              trigger: state
          timeout:
            seconds: 0.5
        - if:
            - condition: template
              value_template: "{{ not wait.trigger }} "
          then:
            - choose:
                - conditions:
                    - condition: template
                      value_template: "{{ repeat.index  == 2 }} "
                  sequence:
                    - action: input_select.select_option
                      metadata: {}
                      data:
                        option: Station_1
                      target:
                        entity_id: input_select.master_bath_stations
                - conditions:
                    - condition: template
                      value_template: "{{ repeat.index  == 3 }} "
                  sequence:
                    - action: input_select.select_option
                      metadata: {}
                      data:
                        option: Station_2
                      target:
                        entity_id: input_select.master_bath_stations
                - conditions:
                    - condition: template
                      value_template: "{{ repeat.index  == 4}} "
                  sequence:
                    - action: input_select.select_option
                      metadata: {}
                      data:
                        option: Station_3
                      target:
                        entity_id: input_select.master_bath_stations
                - conditions:
                    - condition: template
                      value_template: "{{ repeat.index  == 5 }} "
                  sequence:
                    - action: input_select.select_option
                      metadata: {}
                      data:
                        option: Station_4
                      target:
                        entity_id: input_select.master_bath_stations
            - stop: ""
mode: single

It works quite well! I’m so pleased. I don’t fully understand why I had to start the index with 2. If I didn’t, then the first station would not play on one click – it just got skipped. ChatGPT also doesn’t quite understand why, it seems, as its explanation says that a single click should do nothing and five clicks should do something. However, a single click does in fact play Station_1, and five clicks does nothing. All the desired stations play when the button is pressed the requisite number of times.


ChatGPT generated explanation

This Home Assistant automation is designed to respond to button presses from a HomeKit-enabled scene controller in the master bathroom. Here is a detailed breakdown of how it works:

Overview

  • Alias: Master Bath scene controller Button 1 multi click
  • Trigger: A single press of button 1 on the scene controller.
  • Actions: Repeats a sequence up to 5 times, checking for additional button presses. Depending on the number of presses, it selects a different option from an input select entity.

Triggers

The automation is triggered when button 1 (type: button1) on the scene controller (device_id: 8dad5dce6044df4d28b566a412379bdf) is pressed once (subtype: single_press).

Conditions

There are no conditions specified, so the actions will always run when the trigger occurs.

Actions

The actions consist of a repeat loop that runs up to 5 times. Here is a step-by-step explanation of the actions:

  1. Repeat Loop: The loop will run up to 5 times (count: 5).
  2. Wait for Trigger: Inside the loop, it waits for another state change event from the scene controller’s button 1 (event.master_bath_scene_controller_stateless_programmable_switch_1) for up to 0.5 seconds (timeout: 0.5).
  3. Check for Additional Presses:
    • If no additional press is detected within the timeout period ({{ not wait.trigger }}), it checks the current index of the repeat loop (repeat.index).
    • Depending on the value of repeat.index, it selects a different option from the input_select.master_bath_stations entity.
  4. Select Options Based on Press Count:
    • If repeat.index is 2: Selects Station_1.
    • If repeat.index is 3: Selects Station_2.
    • If repeat.index is 4: Selects Station_3.
    • If repeat.index is 5: Selects Station_4.
  5. Stop Action: If an option is selected (based on the number of presses), the loop stops executing further (stop: "").

Mode

The automation runs in single mode, which means it will not start a new instance of the automation if there is already one running.

Example Use Case

  • Single Press: If the button is pressed once, it waits for up to 0.5 seconds for additional presses. If no additional presses are detected, nothing happens.
  • Double Press: If the button is pressed twice within 0.5 seconds, Station_1 is selected.
  • Triple Press: If the button is pressed three times within 0.5 seconds, Station_2 is selected.
  • Quadruple Press: If the button is pressed four times within 0.5 seconds, Station_3 is selected.
  • Five Presses: If the button is pressed five times within 0.5 seconds, Station_4 is selected.

This automation allows for different actions to be taken based on the number of times the button is pressed within a short period.

Docker for old Java-based IPMI systems

As time goes on it’s getting harder to access the remote console on one of my older systems. It requires an old version of Java that just doesn’t work right on modern operating systems. Fortunately, Docker has come to my rescue.

ipmi-kvm-docker is a simple docker setup that will spin up an old version of firefox with the old version of Java that’s required. It wraps it all nicely in a graphical environment inside the docker container and uses novnc to serve this environment up as a web page. You can configure which port it uses, and then you simply connect to your docker host on that port. You can also specify a volume to mount if you have ISOs or other files you’d like your IPMI environment to see. Brilliant.

I simply ran these commands on my docker system to get it up and running:

docker run -p 8080:8080 -v /your/local/folder:/root/images solarkennedy/ipmi-kvm-docker

Then all I had to do was connect to my docker system on port 8080 in a browser. It worked great. My 10+year old server is still going strong, and easy to reinstall now thanks to this tool.

Quick way to make HDD light blink

I needed a quick and dirty way to make a specific hard drive LED blink so that I could identify which drive to replace. I stumbled across this post that worked well for me.

The simple trick is to run smartctl in a while loop. Something about smartctl makes the drive light blink differently than using dd, which is what I was using previously. smartctl is what finally allowed me to identify the drive. Here is the command:

while true; do smartctl -a /dev/<device>; done

The command will run forever until you Ctrl + C. It make the LED blink rather obviously, which made things much easier.

Trigger button to run a script in Home Assistant

I configured a button (Runlesswire Click) to log diaper changes for my new baby. The diaper changes are logged in a Google Docs spreadsheet. I set up a simple public facing Google Form that I could run unauthenticated curl requests against. I then configured Home Assistant to run that curl command when the button is pressed. Instant diaper logging by the press of a button.

Lessons learned:

  • Zigbee Home Assistant (ZHA) does not yet support the Zigbee Green protocol, which the RunlessWire Click uses. I had to pair the switches to my Hue hub instead.
    * It looks like they’re getting close to supporting it, though: https://github.com/zigpy/zigpy/pull/1282

Here was my process:

  • Create Google Form
  • Obtain form ID from URL bar
  • Get pre-filled link to get names of fields by clicking the three dots on top right and clicking “Get pre-filled link”. Make note of the names for each entry e.g. entry.1363419348
    Thanks to help from: https://stackoverflow.com/questions/65142364/i-cant-find-name-attribute-while-inspecting-input-elements-of-google-form-ho
  • Curl command is:
    curl https://docs.google.com/forms/<FORM_URL>/formResponse -d ifq -d <ENTRY_NAME>=<ENTITY_VALUE> -d <ADDITIONAL_ENTRY_NAME>=<ADDITIONAL_ENTRY_VALUE> -d submit=Submit
    Thanks to help from: https://eureka.ykyuen.info/2014/07/30/submit-google-forms-by-curl-command/
  • Shell commands go into configuration.yaml
    shell_command:
    log_pee: <CURL_COMMAND>
    log_poo: <CURL_COMMAND>
    Thanks to help from: https://community.home-assistant.io/t/dont-understand-how-to-use-shell-commands/576580/9
  • Restart Home Assistant to pick up your configuration changes.
  • Configure the automation to call Service: shell_command

Success!

Digitize old photos and videos

Here is a list of hardware and software that I use to digitize old home movies, tapes, and family pictures:

Hardware

35mm film scanner: Pacific Image PowerFilm Plus 35mm Film Scanner

Document / picture scanner: Brother ADS-2700W

Flatbed scanner: Epson DS-50000 Large-Format Document Scanner

Audiocasette player with a 3.5mm output jack

Laptop / computer with 3.5mm input jack (headphone/microphone)

VHS player

USB VHS to Digital Converter

Soft tip silicone air blower

Software

Video capture: OBS Studio

Photo editing: GIMP

Audio capture: Audacity

Digitizing these precious memories makes them available for future generations. They are much more useful to everyone online than they ever were sitting in a box.

Proxmox Ceph storage configuration

These are my notes for migrating my VM storage from NFS mount to Ceph hosted on Proxmox. I ran into a lot of bumps, but after getting proper server-grade SSDs, things have been humming smoothly long enough that it’s time to publish.

A note on SSDs

I had a significant amount of trouble getting ceph to work with consumer-grade SSDs. This is because ceph does a cache writeback call for each transaction – much like NFS. On my ZFS array, I could disable this, but not so for ceph. The result is very slow performance. It wasn’t until I got some Intel DC S3700 drives that ceph became reliable and fast. More details here.

Initial install

I used the Proxmox GUI to install ceph on each node by going to <host> / Ceph. Then I used the GUI to create a monitor, manager, and OSD on each host. Lastly, I used the GUI to create a ceph storage target in Datacenter config.

Small cluster (3 nodes)

My Proxmox cluster is small (3 nodes.) I discovered I didn’t have enough space for 3 replicas (the default ceph configuration), so I had to drop my pool size/min down to 2/1 despite warnings not to do so, since a 3-node cluster is a special case:

https://forum.proxmox.com/threads/ceph-pool-size-is-2-1-really-a-bad-idea.68939/#post-440755

More discussion: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/UB44GH4Z2NJUV52ZTHKO4TGYEX3DZ4CB/

I have not had any problems with this configuration and it provides the space I need.

Ceph pool size

In my early testing, I discovered that if I removed a disk from pool, the size of the pool increased! After doing some reading in redhat documentation, I learned the basics of why this happened.

Size = number of copies of the data in the pool

Minsize = minimum number of copies before pool operation is suspended

I didn’t have enough space for 3 copies of the data. When I removed a disk, the pool it dropped down to the minsize setting (2 copies) – which I did have enough room for. The pool rebalanced to reflect this and it resulted in more space.

Configure Alerting

It turns out that alerting for problems with ceph OSDs and monitors does not come out of the box. You must configure it. Thanks to this thread and the ceph documentation for how to do so. I did this on each proxmox node.

apt install ceph-mgr-dashboard
ceph config set mgr mgr/alerts/smtp_host <MAIL_HOST>'
ceph config set mgr mrg/alerts/smtp_ssl false
ceph config set mgr mgr/alerts/smtp_ssl false
ceph config set mgr mgr/alerts/smtp_port 25
ceph config set mgr mgr/alerts/smtp_destination <DEST_EMAIL>
ceph config set mgr mgr/alerts/smtp_sender <SENDER_EMAIL>
ceph config set mgr mgr/alerts/smtp_from_name 'Proxmox Ceph Cluster'

Test this by telling ceph to send its alerts:

ceph alerts send

Move VM disks to Ceph storage

I ended up writing a simple for loop to move all my existing Proxmox VM disks onto my new ceph cluster. None of my VMs had more than 3 scsi devices. If your VMs have more than that you’ll have to tweak this rudimentary command:

for vm in $(qm list | awk '{print $1}'|grep -v VMID); do qm move-disk $vm scsi0 <CEPH_POOL_NAME>; qm move-disk $vm scsi1 <CEPH_POOL_NAME>; qm move-disk $vm scsi2 <CEPH_POOL_NAME>; done

Rename storage

I tried to edit /etc/pve/storage.cfg to change the name I gave my ceph cluster in Proxmox. That didn’t work (question mark next to the storage after renaming it) so I just removed and re-added instead.

Maintenance

Begin maintenance:

Ceph constantly tries to keep itself in balance. If you take a node down and it stays down for too long, ceph will begin to rebalance the data among the remaining nodes. If you’re doing short term maintenance, you can control this behavior to avoid unnecessary rebalance traffic.

ceph osd set nobackfill
ceph osd set norebalance

Reboot / perform OSD maintenance.

After maintenance is completed:

ceph osd unset nobackfill
ceph osd unset norebalance

Performance benchmark

I did a lot of performance checking when I first started to try and track down why the pool was so slow. In the end it was my consumer-grade SSDs. I’ll keep this section here for future reference.

Redhat article on ceph performance benchmarking

Ceph wiki on benchmarking

rados bench -p SSD 10 write --no-cleanup
rados bench -p SSD 10 seq
rados bench -p SSD 10 seq
rados bench -p SSD 10 rand
rbd create image01 --size 1024 --pool SSD
rbd map image01 --pool SSD --name client.admin
mkfs.ext4 /dev/rbd/SSD/image01  
mkdir /mnt/ceph-block-device
mount /dev/rbd/SSD/image01 /mnt/ceph-block-device/
rbd bench --io-type write image01 --pool=SSD
pveperf /mnt/ceph-block-device/
rados -p SSD cleanup

Undo:

 umount /mnt/ceph-block-device  
 rbd unmap image01 --pool SSD
 rbd rm image01 --pool SSD

MTU 9000 warning

I read that it was recommended to set network MTU to 9000 (jumbo frames. When I did this I experienced weird behavior, connection timeouts – ceph ground to a halt, complaining about slow OSDs, mons. It was too much hassle for me to troubleshoot, so I went back to the standard 1500 MTU.

Datacenter settings

I discovered you can have a host automatically migrate hosts off when you issue the reboot command via the migrate shutdown policy. https://pve.proxmox.com/wiki/High_Availability

Proxmox GUI / Datacenter / Options / HA Settings

Specify SSD or HDD for pools

I have not done this yet but here’s a link I found that explains how to do it: https://stackoverflow.com/questions/58060333/ceph-how-to-place-a-pool-on-specific-osd

Helpful commands

Determine IPs of OSDs:

ceph osd dump - determine IPs of OSDs

Remove monitor from failed node:

ceph mon remove <host>
Also needs to be removed from /etc/ceph/ceph.conf

Configure Backup

I had been using ZFS snapshots and ZFS send to backup my VM disks before the move to ceph. While ceph has snapshot capability, it is slow and takes up extra space in the pool. My solution was to spin up a Proxmox Backup Server and regularly back up to that instead.

Proxmox backup server: can be installed to an existing PVE server if you desire:

https://pbs.proxmox.com/docs/installation.html

Configure the apt repository as follows:

# PBS pbs-no-subscription repository provided by proxmox.com,
# NOT recommended for production use
deb http://download.proxmox.com/debian/pbs bullseye pbs-no-subscription

# security updates
deb http://security.debian.org/debian-security bullseye-security main contrib

# apt-get update
# apt-get install proxmox-backup

I had to add a regular user and give admin permissions on PBS side, then add the host on the proxmox side using those credentials.

Configure automated backup in PVE via Datacenter tab / Backup.

Remember to configure automated verify jobs (scrubs).

Make sure to add an e-mail address for proxmox backup user for alerts.

Edit which account & e-mail is used, and how often notified, at the Datastore level.

Sync jobs

I wanted to synchronize my Proxmox Backup repository to a non-PBS server (simply host the files.) I accomplished this by doing the following:

  • Add 127.0.0.1 as a Remote host (Configuration / Remotes.) Copy the PBS server fingerprint from Certificates / Fingerprint.
  • Create remote datastore in /etc/fstab manually (I used SSHFS to backup to a synology over SSH.)
  • Add datastore in PBS, pointing to manual fstab mount. Then add sync job there

Import PBS datastore (in case of total crash)

I wanted to know how to import the data into a fresh instance of PBS. This is the procedcure:

edit /etc/proxmox-backup/datastore.cfg and add config about the datastore manually. Copy from existing datastore config for syntax.

Space still being taken up after deleting backups

PBS uses access time to determine if something has been touched. It waits 24 hours after the last touch. Garbage collection manually updates atime, but still recommended to keep atime on for the dataset PBS is using. Sources:

https://forum.proxmox.com/threads/zpool-atime-turned-off-effect-on-garbage-collection.76590/

https://pbs.proxmox.com/docs/backup-client.html#garbage-collection

Troubleshooting

Really slow VM IOPS during degrade / rebuild

This also ended up being due to having consumer-grade SSDs in my ceph pools. I’m keeping my notes for what I did to troubleshoot in case they’re useful.

https://forum.proxmox.com/threads/ceph-high-i-o-wait-on-osd-add-remove.20271/

Small cluster. Lower backfill activity so recovery doesn’t cause slowdown:

ceph config set osd osd_max_backfills 1
ceph config set osd osd_recovery_max_active 3

Verify setting was applied: https://www.suse.com/support/kb/doc/?id=000019693

ceph-conf --show-config|egrep "osd_max_backfills|osd_recovery_max_active"
ceph config dump | grep osd

Ramp up backfill performance:

ceph tell osd.* injectargs --osd_max_backfills=2 --osd-recovery_max_active=8 # 2x Increase
ceph tell osd.* injectargs --osd_max_backfills=3 --osd-recovery_max_active=12 # 3x Increase
ceph tell osd.* injectargs --osd_max_backfills=4 --osd_recovery_max_active=16 # 4x Increase
ceph tell osd.* injectargs --osd_max_backfills=1 --osd-recovery_max_active=3 # Back to Defaults

The above didn’t help, turns out consumer SSDs are very bad:

https://yourcmc.ru/wiki/Ceph_performance#General_benchmarking_principles

https://blog.cypressxt.net/hello-ceph-and-samsung-850-evo/

I bought some Intel DC S3700 on ebay for $75 a piece. It fixed all my latency/speed issues.

Dead mon despite being removed from cli

I had a situation where a monitor showed up as dead in proxmox, but I was unable to delete it. I followed this procedure:

rm /etc/systemd/system/ceph-mon.target.wants/ceph-mon@<nodename>.service

Dead pve node procedure

remove from /etc/ceph/ceph.conf, remove /var/lib/ceph/mon/ceph-<node>, remove rm /etc/systemd/system/ceph-mon.target.wants/ceph-mon@pve2.service

https://forum.proxmox.com/threads/ceph-cant-remove-monitor-with-unknown-status.63613/

Adding through GUI brought me back to the same problem.

Bring node back manually

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/

 ceph auth get mon. -o /tmp/key
 ceph mon getmap -o /tmp/map
 ceph-mon -i <node_name> –mkfs –monmap /tmp/map –keyring /tmp/key  
 ceph-mon -i <node_name> –public-addr <node_ip>:6789  
 ceph mon enable-msgr2
 vi /etc/pve/ceph.conf

In the end the most surefire way to fix this problem was to re-image the affected host.

Clear HEALTH_WARNING in GUI

In my testing I had tried pulling disks at random, then putting them back in. This recovered well, but I had this message:

HEALTH_WARN 1 daemons have recently crashed

To clear it I had to drop to the CLI and run this command:

ceph crash archive-all

Thanks to the Proxmox Forums for the fix.

Pool cleanup

I noticed I would get rbd error: rbd: listing images failed: (2) No such file or directory (500) when trying to look at which disks were on my Ceph pool. I fixed this by removing the offending images as per this post.

I then ran another rbd ls -l <POOL_NAME> command to see what was left and noticed several items without anything in the LOCK column. I discovered these were artifacts from failed disk migrations I tried early on – wasted space. I removed them one by one with the following command:

rbd rm <VM_FILE_NAME> -p <POOL_NAME>

Be careful to verify they’re not disks that are in use with VMs with are powered off – they will also show no lock for non-running VMs.

Disk errors

I had a disk fail, but then I pulled out the wrong disk. I kept getting these errors:

Warning: Error fsyncing/closing /dev/mapper/ceph--fc741b6c--499d--482e--9ea4--583652b541cc-osd--block--843cf28a--9be1--4286--a29c--b9c6848d33ba: Input/output error

I was unable to remove it from the GUI. After a while I realized the problem – I was on the wrong node. I needed to be on the node that has the disks when creating an OSD in the Proxmox GUI.

Steps to determine which disk is assigned to an OSD, from ceph docs:

ceph-volume lvm list
====== osd.2 =======

 [block]       /dev/ceph-680265f2-0b3c-4426-b2a8-acf2774d82e0/osd-block-2096f339-0572-4e1d-bf20-52335af9b374

     block device              /dev/ceph-680265f2-0b3c-4426-b2a8-acf2774d82e0/osd-block-2096f339-0572-4e1d-bf20-52335af9b374
     block uuid                tcnwFr-G33o-ybue-n0mP-cDpe-sp9y-d0gvYS
     cephx lockbox secret       
     cluster fsid              65f26da0-fca0-4419-ba15-20269a5a363f
     cluster name              ceph
     crush device class        ssd
     encrypted                 0
     osd fsid                  2096f339-0572-4e1d-bf20-52335af9b374
     osd id                    2
     osdspec affinity           
     type                      block
     vdo                       0
     devices                   /dev/sde

Update 6/20/2024

One year later and Ceph has been running great. So great, in fact, that I migrated my bulk storage to it as well. Here are my notes on that endeavor.

Optimal number of PGs

I discovered that there is an optimal number of PGs you want in a ceph cluster. It depends on how many OSDs you have. Link: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/storage_strategies_guide/placement_groups_pgs#pg-count-for-small-clusters

The optimal number of PGs is the following, rounding up to the nearest power of two:

                (OSDs * 100)
   Total PGs =  ------------
                 pool size

In my case (only 3 OSDs – one per node) that made my optimal number of PGs 256.

Slow write speeds for HDDs

Moving OSD DB to SSD – The slow way

I had pretty slow write speeds when adding my 3 HDDs to a new pool (50 MB/s max.) I read the best way to help with this is to offload the db and WAL to an SSD for each OSD. It’s possible to have multiple OSDs using a single SSD for such operations, but since I don’t have enterprise-grade SSDs, I opted to do a 1:1 HDD:SSD mapping. Unfortunately, I had already created the OSDs before I realized I needed to do this. So I had to destroy & re-create each OSD one by one to add the SSD.

https://www.reddit.com/r/ceph/comments/fgvcte/replace_osd_node_without_remapping_pgs

set flag norebalance, norecover, nobackfill, destroy the OSD and join the new OSD as the same ID of the old one.

This worked, but took two days to rebuild. I set out to find a faster option

Moving OSD DB to SSD – The fast way

Migrate DB to SSD without destroying OSD

https://www.reddit.com/r/ceph/comments/1awwoch/yet_another_ceph_poor_performance_post_part_deux

https://github.com/45Drives/scripts/blob/main/add-db-to-osd.sh

Requires jq and bc

I kept getting the error message

WARNING: Device selected (/dev/sdd) has a LVM2_member signature, but no volume group
Wipe disk and run again

despite completely wiping the drive. I dove into the source of the script and found it creates a PV & VG for the drive, and that must be failing, so I did it manually:

pvcreate /dev/sdd

vgcreate ceph-$(uuidgen) /dev/sdd

./add-db-to-osd.sh -b 465G -d /dev/sdd -o 3

This worked beautifully.

Move OSD DB to new device

I discovered that when it comes to DB devices, the same advice about SSDs is still true: Don’t waste your time with consumer SSDs. I ordered some more Intel DC S3700 drives and now needed to replace them. The 45 drives script doesn’t work because the DB had already been migrated to a different SSD. This is the process to move from one dedicated DB device to another:

Thanks to this thread https://www.reddit.com/r/ceph/comments/1bk6e9s/moving_db_and_wal_from_ssd_to_hdd/

and this documentation: https://docs.ceph.com/en/latest/ceph-volume/lvm/list/

https://docs.ceph.com/en/quincy/ceph-volume/lvm/migrate

Plug new drive in alongside existing drive

Obtain OSD fsid with this command: ceph-volume lvm list

pvcreate /dev/<new_device>

vgcreate ceph-$(uuidgen) /dev/<new_device>

lvcreate -l100%FREE -n ceph-osd-db-<OSD FSID> ceph-<UUIDGEN_FROM_ABOVE>

systemctl stop ceph-osd@<OSD_ID>

ceph-volume lvm migrate –osd-id <OSD_ID> –osd-fsid <OSD_FSID> –from db wal –target ceph-<UUIDGEN_FROM_ABOVE>/ceph-osd-db-<OSD FSID>

--> Migrate to new, Source: ['--devs-source', '/var/lib/ceph/osd/ceph-4/block.db'] Target: /dev/ceph-60969103-7d88-4340-a13f-a77f98e1da46/osd-db-800G
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-4/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-6
--> Migration successful.

systemctl start ceph-osd@<OSD_ID>

System with no additional HDD slots
Used a USB3 SSD adapter temporarily. Migrated DB,  remove old device, add new device. Reboot node.

Sizing DB device

https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref

For RBD workloads, however, block.db usually needs no more than 1% to 2% of the block size.

Move from dedicated DB device back to OSD

https://www.reddit.com/r/ceph/comments/1bwma91/script_to_move_separate_db_lv_back_to_block_device

Fix KVMD not starting after updating piKVM

piKVM is amazing. I had one controlling an old desktop of mine for over 18 months with no issues. I decided to update its software and ran into some problems.

The first problem was pacman was returning a 404 when trying to update. I guess in the last year and a half the repository URL had changed. I had to edit /etc/pacman.conf and update the URL:

[pikvm]
Server = https://files.pikvm.org/repos/arch/rpi4
SigLevel = Required DatabaseOptional

After fixing that, running pacman -Syu and answering yes, I rebooted, but found that kvmd would not start. The first symptom: HTTP 500 from nginx. Digging in I found that two services were failing to start: kvmd-tc358743.service and kvm-otg.service.

v4l2-ctl[429]: Cannot open device /dev/kvmd-video, exiting
kvmd-otg[398]: RuntimeError: Can't find any UDC

With these two services bailing the web UI wouldn’t start. I checked the kernel log and the tc358743 device was not detected at all. I was about to give up and just reflash the device when I noticed two files in /boot: cmdline.txt.pacsave and config.txt.pacsave. I know from my experience in arch that it means I had some configurations get clobbered. Running a diff between the two I found some very important lines omitted:

dtoverlay=tc358743
dtoverlay=disable-bt
dtoverlay=dwc2,dr_mode=peripheral

I restored the .pacsave files and rebooted, and it worked! Everything came back.

Next time I won’t wait so long between software updates.

piKVM pushover startup script

I’ve had an issue where I wasn’t sure if my dynamic DNS provider registered properly. I then realized that I have a piKVM attached to one of my servers that boots on powerup, even if the server does not. I could utilize this piKVM to help me out.

Thanks to inspiration from Chris Dzombak I was able to whip up a little script that runs on startup. This script waits 5 minutes to allow for my firewall and modem to boot up, then sends a pushover notification to let me know the piKVM is online and what its external IP address is.

To get it working on the piKVM I had to enter into RW mode, write and save the script, add execute permissions to the script, then configure a systemd service to run the script at startup.

Here is the script, saved under /root/boot-pushover.sh

#!/usr/bin/env bash
set -eu

#Wait 5 minutes to allow router bootup
sleep 300

TOKEN="PUSHOVER_APPLICATION_TOKEN"
USER="PUSHOVER_USER_TOKEN"
EXTERNAL_IP="$(curl ifconfig.me)"
MESSAGE="$(hostname) is online. External IP: $EXTERNAL_IP"

#Send pushover command to alert it's up and send its external IP
curl -s \
  --form-string "token=$TOKEN" \
  --form-string "user=$USER" \
  --form-string "message=$MESSAGE" \
  https://api.pushover.net/1/messages.json

Set executable: chmod +x /root/boot-pushover.sh

Here is the systemd service, saved under /etc/systemd/system/boot-pushover-notification.service

[Service]
Type=oneshot
ExecStart=/root/boot-pushover.sh
RemainAfterExit=yes
User=root
Group=root
RestartSec=15
Restart=on-failure

[Unit]
Wants=network.target
After=network.target nss-lookup.target

[Install]
WantedBy=multi-user.target

Reload daemons & enable startup:

systemctl daemon-reload
systemctl enable boot-pushover-notification.service

Test by exiting rw mode and rebooting the piKVM:

ro
reboot

It works really well!

Sideload Zwift on NVIDIA Shield

I recently got a Tacx Neo 2 smart trainer for my bike and was eager to use it on my big screen TV with Zwift. Unfortunately, despite the Nvidia Shield being a more that capable Android device, Zwift does not show up in the Google play store. I didn’t want to stream Zwift from my PC because a) the Windows app is annoying and doesn’t go full screen (title bar at the top) and b) my PC is located upstairs and the bluetooth doesn’t appear to reach from the trainer to the PC.

My solution to this was to sideload the Zwift app onto my Nvidia shield. It wasn’t as straightforward as most sideloading due to how Zwift is configured: it has an APK file, and an OBB file. The APK is small and is the application itself, the OBB file is all the map data (it’s large – over 600 MB)

Fortunately, a new Android format called xapk exists, and is an archive of both in one package. This was the process I used to successfully get my Zwift on my Nvidia Shield:

  1. Download Zwift xapk file from apkpure: https://apkpure.com/zwift/com.zwift.zwiftgame
    I had to use a download manager (I used XDM) because downloading in the browser kept failing.
  2. Copy XAPK to USB, plug USB into the shield
  3. Install a file explorer app on the shield via the play store (I used FX file explorer)
  4. Use the file explorer app to open the xapk file as an archive (Archive Explorer)
  5. Click on APK and install it by selecting to open with Android installer (grant access to when prompted)


  6. Copy the Android/obb/com.zwift.zwiftgame folder to home / Android / obb






  7. Access Zwift by going to settings / Apps / See all apps and scroll to the bottom







    1. Optional: Install sideload app from the play store, launch Zwift from that app instead

The shield remote doesn’t appear to control anything within the app. Plug in a mouse so you can swipe away the first run tutorial screens (hold left click and drag to the left.) Optional: plug in a keyboard while you’re at it so you can log in faster.

Success! My trainer showed up in the pairing screen and everything works! You can even have your own music playing in the background, with a caveat: if you ever switch apps away from Zwift, it will reset back to the login screen because the Shield doesn’t appear to have enough memory to keep Zwift running when switching another app to the foreground. If you want your shield to play music, start the music first, then switch to Zwift. Once you’re in Zwift, you can’t switch away to any other app without losing your progress.

DIY IP KVM

In general I try to buy server-class hardware for my home lab, primarily so that I could have IPMI / Remote access console for remote OS installation & troubleshooting. I recently got a new desktop and found myself with a Threadripper 1950x that would make an excellent addition to my server cluster. The one problem being it’s a desktop-class board, so it does not have any IPMI / remote access device.

I solved my problem with pikvm. It works wonderfully! Pikvm uses a raspberry pi with some additional hardware and software to interface with a system to control power & reset capabilities, as well as KVM functions with the ability to upload OS images and do OS installations remotely. The whole project cost me about $150 since I didn’t have some of the essential items for it. It could definitely be cheaper if I didn’t buy large packs of items or already had some electronics components.

The process was straightforward as outlined on their github page. The only snag I ran into was creating the USB Y (split) cable. It did not work the first time, so I had to tear it all down and start again. One cable I used had more than 4 wires (3 red wires, 1 black, 1 green, 1 white, and 1 yellow.) When I re-assembled to include the yellow wire with the red and black, it all worked.

My custom made Y cable (made from two cables I had lying around)
fully assembled and ready to test
Attached to a test motherboard

I scavenged the metal mounting bracket from some old networking adapter cards. With those I was able to mount the pi and the HDMI-in module to two standard PCI express card slots. I accidentally destroyed one of my SD cards while doing this so be careful if you try it! The PI is mounted at a slight angle so as to not damage the SD card. I had to mount it backwards (ethernet in the back) because I couldn’t get power to it otherwise (power port right up against the motherboard.) My workaround for this was to custom make a short length ethernet cord and use an RJ45 coupler on the outside of the chassis to provide an easy to access network port for the pi.

I wired the power & reset switch, as well as HDD and power LEDs in parallel so they would function with the chassis as well as with the KVM. To do this simply get some male-to-male jumper wires. On one end plug into the chassis wire, and on the other plug into the corresponding positive and negative slots right next to the ones going to the pi.

Cable management nightmare. But it works XD
Finished product

Broadboard pinout: https://github.com/pikvm/pikvm/blob/master/img/v2.png

USB split cable diagram: https://github.com/pikvm/pikvm/blob/master/img/v2_splitter.png

Parts list:

Raspberry Pi 4B 2GB edition: https://www.amazon.com/gp/product/B07TD42S27/ref=ppx_yo_dt_b_asin_title_o04_s00?ie=UTF8&psc=1

Raspberry Pi 4 headsink pack: https://www.amazon.com/gp/product/B07ZLZRDXZ/ref=ppx_yo_dt_b_asin_title_o02_s00?ie=UTF8&psc=1

Raspberry Pi HDMI in Module: https://www.amazon.com/gp/product/B0899L6ZXZ/ref=ppx_yo_dt_b_asin_title_o03_s00?ie=UTF8&psc=1

16GB Micro SD card: https://www.amazon.com/gp/product/B073K14CVB/ref=ppx_yo_dt_b_asin_title_o04_s01?ie=UTF8&psc=1

1 foot HDMI cable: https://www.amazon.com/gp/product/B00DI88XEG/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1

Breadboard 3 pack: https://www.amazon.com/gp/product/B077DN2PS1/ref=ppx_yo_dt_b_asin_title_o04_s01?ie=UTF8&psc=1

Breadboard Jumper Wires: https://www.amazon.com/gp/product/B07GD2BWPY/ref=ppx_yo_dt_b_asin_title_o04_s00?ie=UTF8&psc=1

Resistor Assortment Kit: https://www.amazon.com/gp/product/B0792M83JH/ref=ppx_yo_dt_b_asin_title_o04_s00?ie=UTF8&psc=1

390 OHM resistors: https://www.amazon.com/gp/product/B07QK9NFGT/ref=ppx_yo_dt_b_asin_title_o04_s01?ie=UTF8&psc=1

SSR relays: https://www.digikey.com/product-detail/en/G3VM-61A1/Z2100-ND/673290