All posts by nicholas

Proxmox Ceph storage configuration

These are my notes for migrating my VM storage from NFS mount to Ceph hosted on Proxmox. I ran into a lot of bumps, but after getting proper server-grade SSDs, things have been humming smoothly long enough that it’s time to publish.

A note on SSDs

I had a significant amount of trouble getting ceph to work with consumer-grade SSDs. This is because ceph does a cache writeback call for each transaction – much like NFS. On my ZFS array, I could disable this, but not so for ceph. The result is very slow performance. It wasn’t until I got some Intel DC S3700 drives that ceph became reliable and fast. More details here.

Initial install

I used the Proxmox GUI to install ceph on each node by going to <host> / Ceph. Then I used the GUI to create a monitor, manager, and OSD on each host. Lastly, I used the GUI to create a ceph storage target in Datacenter config.

Small cluster (3 nodes)

My Proxmox cluster is small (3 nodes.) I discovered I didn’t have enough space for 3 replicas (the default ceph configuration), so I had to drop my pool size/min down to 2/1 despite warnings not to do so, since a 3-node cluster is a special case:

More discussion:

I have not had any problems with this configuration and it provides the space I need.

Ceph pool size

In my early testing, I discovered that if I removed a disk from pool, the size of the pool increased! After doing some reading in redhat documentation, I learned the basics of why this happened.

Size = number of copies of the data in the pool

Minsize = minimum number of copies before pool operation is suspended

I didn’t have enough space for 3 copies of the data. When I removed a disk, the pool it dropped down to the minsize setting (2 copies) – which I did have enough room for. The pool rebalanced to reflect this and it resulted in more space.

Configure Alerting

It turns out that alerting for problems with ceph OSDs and monitors does not come out of the box. You must configure it. Thanks to this thread and the ceph documentation for how to do so. I did this on each proxmox node.

apt install ceph-mgr-dashboard
ceph config set mgr mgr/alerts/smtp_host <MAIL_HOST>'
ceph config set mgr mrg/alerts/smtp_ssl false
ceph config set mgr mgr/alerts/smtp_ssl false
ceph config set mgr mgr/alerts/smtp_port 25
ceph config set mgr mgr/alerts/smtp_destination <DEST_EMAIL>
ceph config set mgr mgr/alerts/smtp_sender <SENDER_EMAIL>
ceph config set mgr mgr/alerts/smtp_from_name 'Proxmox Ceph Cluster'

Test this by telling ceph to send its alerts:

ceph alerts send

Move VM disks to Ceph storage

I ended up writing a simple for loop to move all my existing Proxmox VM disks onto my new ceph cluster. None of my VMs had more than 3 scsi devices. If your VMs have more than that you’ll have to tweak this rudimentary command:

for vm in $(qm list | awk '{print $1}'|grep -v VMID); do qm move-disk $vm scsi0 <CEPH_POOL_NAME>; qm move-disk $vm scsi1 <CEPH_POOL_NAME>; qm move-disk $vm scsi2 <CEPH_POOL_NAME>; done

Rename storage

I tried to edit /etc/pve/storage.cfg to change the name I gave my ceph cluster in Proxmox. That didn’t work (question mark next to the storage after renaming it) so I just removed and re-added instead.


Begin maintenance:

Ceph constantly tries to keep itself in balance. If you take a node down and it stays down for too long, ceph will begin to rebalance the data among the remaining nodes. If you’re doing short term maintenance, you can control this behavior to avoid unnecessary rebalance traffic.

ceph osd set nobackfill
ceph osd set norebalance

Reboot / perform OSD maintenance.

After maintenance is completed:

ceph osd unset nobackfill
ceph osd unset norebalance

Performance benchmark

I did a lot of performance checking when I first started to try and track down why the pool was so slow. In the end it was my consumer-grade SSDs. I’ll keep this section here for future reference.

Redhat article on ceph performance benchmarking

Ceph wiki on benchmarking

rados bench -p SSD 10 write --no-cleanup
rados bench -p SSD 10 seq
rados bench -p SSD 10 seq
rados bench -p SSD 10 rand
rbd create image01 --size 1024 --pool SSD
rbd map image01 --pool SSD --name client.admin
mkfs.ext4 /dev/rbd/SSD/image01  
mkdir /mnt/ceph-block-device
mount /dev/rbd/SSD/image01 /mnt/ceph-block-device/
rbd bench --io-type write image01 --pool=SSD
pveperf /mnt/ceph-block-device/
rados -p SSD cleanup


 umount /mnt/ceph-block-device  
 rbd unmap image01 --pool SSD
 rbd rm image01 --pool SSD

MTU 9000 warning

I read that it was recommended to set network MTU to 9000 (jumbo frames. When I did this I experienced weird behavior, connection timeouts – ceph ground to a halt, complaining about slow OSDs, mons. It was too much hassle for me to troubleshoot, so I went back to the standard 1500 MTU.

Datacenter settings

I discovered you can have a host automatically migrate hosts off when you issue the reboot command via the migrate shutdown policy.

Proxmox GUI / Datacenter / Options / HA Settings

Specify SSD or HDD for pools

I have not done this yet but here’s a link I found that explains how to do it:

Helpful commands

Determine IPs of OSDs:

ceph osd dump - determine IPs of OSDs

Remove monitor from failed node:

ceph mon remove <host>
Also needs to be removed from /etc/ceph/ceph.conf

Configure Backup

I had been using ZFS snapshots and ZFS send to backup my VM disks before the move to ceph. While ceph has snapshot capability, it is slow and takes up extra space in the pool. My solution was to spin up a Proxmox Backup Server and regularly back up to that instead.

Proxmox backup server: can be installed to an existing PVE server if you desire:

Configure the apt repository as follows:

# PBS pbs-no-subscription repository provided by,
# NOT recommended for production use
deb bullseye pbs-no-subscription

# security updates
deb bullseye-security main contrib

# apt-get update
# apt-get install proxmox-backup

I had to add a regular user and give admin permissions on PBS side, then add the host on the proxmox side using those credentials.

Configure automated backup in PVE via Datacenter tab / Backup.

Remember to configure automated verify jobs (scrubs).

Make sure to add an e-mail address for proxmox backup user for alerts.

Edit which account & e-mail is used, and how often notified, at the Datastore level.

Sync jobs

I wanted to synchronize my Proxmox Backup repository to a non-PBS server (simply host the files.) I accomplished this by doing the following:

  • Add as a Remote host (Configuration / Remotes.) Copy the PBS server fingerprint from Certificates / Fingerprint.
  • Create remote datastore in /etc/fstab manually (I used SSHFS to backup to a synology over SSH.)
  • Add datastore in PBS, pointing to manual fstab mount. Then add sync job there

Import PBS datastore (in case of total crash)

I wanted to know how to import the data into a fresh instance of PBS. This is the procedcure:

edit /etc/proxmox-backup/datastore.cfg and add config about the datastore manually. Copy from existing datastore config for syntax.

Space still being taken up after deleting backups

PBS uses access time to determine if something has been touched. It waits 24 hours after the last touch. Garbage collection manually updates atime, but still recommended to keep atime on for the dataset PBS is using. Sources:


Really slow VM IOPS during degrade / rebuild

This also ended up being due to having consumer-grade SSDs in my ceph pools. I’m keeping my notes for what I did to troubleshoot in case they’re useful.

Small cluster. Lower backfill activity so recovery doesn’t cause slowdown:

ceph config set osd osd_max_backfills 1
ceph config set osd osd_recovery_max_active 3

Verify setting was applied:

ceph-conf --show-config|egrep "osd_max_backfills|osd_recovery_max_active"
ceph config dump | grep osd

Ramp up backfill performance:

ceph tell osd.* injectargs --osd_max_backfills=2 --osd-recovery_max_active=8 # 2x Increase
ceph tell osd.* injectargs --osd_max_backfills=3 --osd-recovery_max_active=12 # 3x Increase
ceph tell osd.* injectargs --osd_max_backfills=4 --osd_recovery_max_active=16 # 4x Increase
ceph tell osd.* injectargs --osd_max_backfills=1 --osd-recovery_max_active=3 # Back to Defaults

The above didn’t help, turns out consumer SSDs are very bad:

I bought some Intel DC S3700 on ebay for $75 a piece. It fixed all my latency/speed issues.

Dead mon despite being removed from cli

I had a situation where a monitor showed up as dead in proxmox, but I was unable to delete it. I followed this procedure:

rm /etc/systemd/system/<nodename>.service

Dead pve node procedure

remove from /etc/ceph/ceph.conf, remove /var/lib/ceph/mon/ceph-<node>, remove rm /etc/systemd/system/

Adding through GUI brought me back to the same problem.

Bring node back manually

 ceph auth get mon. -o /tmp/key
 ceph mon getmap -o /tmp/map
 ceph-mon -i <node_name> –mkfs –monmap /tmp/map –keyring /tmp/key  
 ceph-mon -i <node_name> –public-addr <node_ip>:6789  
 ceph mon enable-msgr2
 vi /etc/pve/ceph.conf

In the end the most surefire way to fix this problem was to re-image the affected host.


In my testing I had tried pulling disks at random, then putting them back in. This recovered well, but I had this message:

HEALTH_WARN 1 daemons have recently crashed

To clear it I had to drop to the CLI and run this command:

ceph crash archive-all

Thanks to the Proxmox Forums for the fix.

Pool cleanup

I noticed I would get rbd error: rbd: listing images failed: (2) No such file or directory (500) when trying to look at which disks were on my Ceph pool. I fixed this by removing the offending images as per this post.

I then ran another rbd ls -l <POOL_NAME> command to see what was left and noticed several items without anything in the LOCK column. I discovered these were artifacts from failed disk migrations I tried early on – wasted space. I removed them one by one with the following command:

rbd rm <VM_FILE_NAME> -p <POOL_NAME>

Be careful to verify they’re not disks that are in use with VMs with are powered off – they will also show no lock for non-running VMs.

Disk errors

I had a disk fail, but then I pulled out the wrong disk. I kept getting these errors:

Warning: Error fsyncing/closing /dev/mapper/ceph--fc741b6c--499d--482e--9ea4--583652b541cc-osd--block--843cf28a--9be1--4286--a29c--b9c6848d33ba: Input/output error

I was unable to remove it from the GUI. After a while I realized the problem – I was on the wrong node. I needed to be on the node that has the disks when creating an OSD in the Proxmox GUI.

Steps to determine which disk is assigned to an OSD, from ceph docs:

ceph-volume lvm list
====== osd.2 =======

 [block]       /dev/ceph-680265f2-0b3c-4426-b2a8-acf2774d82e0/osd-block-2096f339-0572-4e1d-bf20-52335af9b374

     block device              /dev/ceph-680265f2-0b3c-4426-b2a8-acf2774d82e0/osd-block-2096f339-0572-4e1d-bf20-52335af9b374
     block uuid                tcnwFr-G33o-ybue-n0mP-cDpe-sp9y-d0gvYS
     cephx lockbox secret       
     cluster fsid              65f26da0-fca0-4419-ba15-20269a5a363f
     cluster name              ceph
     crush device class        ssd
     encrypted                 0
     osd fsid                  2096f339-0572-4e1d-bf20-52335af9b374
     osd id                    2
     osdspec affinity           
     type                      block
     vdo                       0
     devices                   /dev/sde

Convert TIF to JPG with ImageMagick

My new project is digitizing film negatives. Following advice found on the DataHoarder subreddit, I’m scanning these files in the highest possible quality in uncompressed TIF files. These TIF files are too big for regular consumption, thus the need to convert to JPG.

ImageMagick is amazing, and does the job nicely. Make sure you have the imagemagick package installed, and it’s as simple as using the convert command.

This is my simple script for converting all TIF files to JPG, and outputting them to the same directory:

for file in *.tif; do echo converting "$file" to "${file%.*}.jpg"; convert "$file" "${file%.*}.jpg"; done

It uses bash substitution to remove the TIF extension in the resulting JPG file. It works beautifully!

Update 4/14/2023:

I have re-worked this a bit to handle multiple directories. It involves setting the Internal Field Separator to be ‘ \n’ instead of space (default) and using the find command. The multi-directory command is below:

IFS=$'\n'; for file in $(find . -name *.tif); do echo converting "$file" to "${file%.*}.jpg"; convert "$file" "${file%.*}.jpg"; done;unset IFS

Restart wireguard interface in OpenWRT

One annoying issue with wireguard in OpenWRT is the fact that it won’t re-check DNS on connection failure. In the event that my public IP changes (dynamic IP) the OpenWRT wireguard client doesn’t ever get the memo, even when DNS is updated.

I discovered here that you can tell OpenWRT via the command line to stop and start the wireguard interface. This forces a new DNS check and then the tunnel builds successfully. The command:

ubus call network.interface.wg0 down &&  ubus call network.interface.wg0 up

Success! Throw this into a cron job and you have an automated failsafe to ensure a reconnect after IP change.

Wireguard one-way traffic on USG Pro 4 after dual WAN setup

I have a site-to-site VPN between my Ubiquiti USG Pro-4 and an OpenWRT device over wireguard . It’s worked great until I got a secondary WAN connection as a failover connection since my primary cable connection has been flaky lately.

When you introduce dual-WAN on Ubiquiti devices you have to manually configure everything since the GUI assumes only one WAN connection. I configured my manual DNAT (port forwards) for each interface successfully but struggled to figure out why suddenly my Wireguard VPN between my two sites only went one way (remote side could ping all hosts on local side, but not visa-versa.)

After some troubleshooting I realized the firewall itself could ping the remote subnet just fine, it just wasn’t allowing local hosts to do so. I couldn’t find anything in firewall logs. Eventually I came across this very helpful page from that helped me to solve my problem.

The solution was to add a Firewall Modify rule specifically for the eth0 interface (where all my LAN traffic is routed through) to allow the source address of the subnets I want to traverse the VPN, then apply that modifier to the LAN_IN firewall rule for that interface. I had to do it for any VLANs I wanted to be able to use the Wireguard tunnel as well (vifs of eth0, VLAN 50 in my case)

Here is the relevant config.gateway.json sections, namely “firewall” and “interfaces”:

    "firewall": {
        "modify": {
            "Wireguard": {
                "rule": {
                    "10": {
                        "action": "modify",
                        "description": "Allow Wireguard traffic",
                        "modify": {
                            "table": "10"
                        "source": {
                            "address": ""
        "interfaces": {
            "ethernet": {
                "eth0": {
                    "firewall": {
                        "in": {
                            "ipv6-name": "LANv6_IN",
                            "modify": "Wireguard",
                            "name": "LAN_IN"
                    "vif": {
                        "50": {
                            "firewall": {
                                "in": {
                                    "ipv6-name": "LANv6_IN",
                                    "modify": "Wireguard",
                                    "name": "LAN_IN"

This did the trick! Wireguard is working both directions again, this time with my dual WAN connections.

Prioritize wifi with Network Manager in Arch

My cable internet has been horrid lately. I wanted to be able to hotspot to my phone while maintaining LAN connections to my servers while the cable company takes its sweet time to fix things. Even though I connected to wifi on my phone, my desktop still prioritized the broken connection and wouldn’t use my phone to get to the internet. I verified this by looking at the routing table and running traceroute

sudo ip route
default via dev br0 proto dhcp src metric 425 
default via dev wlp69s0 proto dhcp src metric 600 

traceroute --max-hops=1
 1  _gateway (  0.409 ms  0.449 ms  0.483 ms

The LAN connection’s default gateway had a lower metric than the mobile hotspot connection (lower takes precedence.) To fix this I ran this networkmanager command (thanks to this post for the inspiration)

sudo nmcli connection modify "Nicholas’s iPhone" ipv4.route-metric 50

I noticed DNS traffic was also prioritizing my LAN, which I didn’t want. I fixed it with nmcli as well (thanks to this post)

sudo nmcli connection modify "Nicholas’s iPhone" ipv4.dns-priority 1

I then noticed I couldn’t get to certain LAN subnets. I then realized I needed to add some static routes so they don’t try to go over my hotspot connection (which I learned about here)

sudo nmcli connection modify bridge-br0 +ipv4.routes ""

Note you may need to refresh your connection once you’ve made changes. You can either disconnect and reconnect to force a refresh, or run this command (as outlined here.)

sudo nmcli con up bridge-br0 #or whatever your LAN interface name is

Once I refreshed my settings, I was able to get internet via my phone while maintaining all my local network settings.

Convert music to iPhone ringtones

I’ve recently crossed into the dark side and gotten my first iPhone. I wanted to set up ringtones for my contacts but discovered that Apple is pretty picky about ringtone file format. After some searching I found the ffmpeg command to run to get the ringtones into a file iPhones are happy with.

The criteria are:

  • aac codec
  • m4r filename
  • 30 seconds or less in duration

I ran into a snag with various things I was trying to convert apparently having more than one stream. I would get the error message

Could not find tag for codec h264 in stream #0, codec not currently supported in container

I had to use the map command to specify exactly which stream I wanted (just the audio one.) I discovered which stream I wanted by running the ffmpeg -i command on the file to see its available streams. I also discovered that some songs reported incorrect duration. This was fixed with the -write_xing 0 option. Thanks to this gist for the inspiration.

Here is the full command to turn music into an Apple-compatible ringtone. Modify -ss and -to to suit your needs (starting time, to ending time)

ffmpeg -i <input file> -codec:a aac -ss 00:00:59.5 -to 00:01:21.5 -f ipod -map 0:0 -write_xing 0 ringtone.m4r

If taking a ringtone from a video file, I had to specify I wanted stream 1 instead of stream 0:
ffmpeg -i Batman-\ The\ Animated\ Series\ -\ S02E01\ -\ Shadow\ of\ the\ Bat\ \(1\)\ SDTV.avi
-codec:a aac -ss 00:00:20.5 -to 00:00:43.5 -f ipod -write_xing 0 -map 0:1 batman_ringtone.m4r

Once you have the right m4r file, you simply need to plug your iPhone into your computer and fire up iTunes. You can then drag the file into the "Tones" section on the left under "Devices".

Updating Zimbra to latest version

Recently a remote code execution bug came to light with Zimbra. It prompted me to update to the latest patch. I had some e-mail deliverability issues afterward. Here are my patch notes:

  • Download the latest version from
    • Follow instructions as listed here:
  • Untar downloaded file, cd into directory and run ./ as root
  • Re-install latest patches (I had frustrating 500 errors until I discovered this was the fix)
    • sudo yum reinstall zimbra-patch
  • Re-do any customization you’ve done to zimbra core
    • In my case, it was adding these lines to the smtp-amavis section:
    -o smtp_tls_security_level=none
    -o smtp_tls_wrappermode=no
  • Restart Zimbra services
    • sudo -u zimbra zmcontrol restart

Configure Zimbra to use AnyMXRelay

It turns out if you want to configure Zimbra to use an external SMTP relay service it can be a bit of a headache if that service doesn’t use port 25 or 587 to receive encrypted relay mail. Such is the case with AnyMXRelay. I decided to use AnyMXRelay to relay my mail since my Linode box keeps getting put on weird shadow blocklists despite mxtoolbox saying everything was fine.

It took some digging but I finally found this article on Zimbra’s wiki outlining what needs to happen. There are a few manual settings that need to be put in place on the command line in order to get this to work – namely, smtp_tls_wrappermode and smtp_tls_security_level.

In addition to the steps taken in this how-to for sending mail through a relay, you must also make these changes:

postconf -e smtp_tls_wrappermode=yes   # No Zimbra setting for smtp_tls_wrappermode yet
zmprov ms `zmhostname` zimbraMtaSmtpTlsSecurityLevel encrypt
zmprov ms `zmhostname` zimbraMtaSmtpTlsCAfile /opt/zimbra/ssl/zimbra/commercial/commercial_ca.crt
zmprov ms `zmhostname` zimbraMtaSmtpSaslSecurityOptions noanonymous
zmprov ms `zmhostname` zimbraMtaSmtpSaslAuthEnable yes
zmprov ms `zmhostname` zimbraMtaSmtpSaslPasswordMaps lmdb:/opt/zimbra/conf/relay_password

Zimbra 8.5+ periodically applies settings automatically, so once you’ve made these changes, watch /var/log/zimbra.log for these lines

zmconfigd[25662]: Fetching All configs
zmconfigd[25662]: All configs fetched in 0.07 seconds
zmconfigd[25662]: All restarts completed in 1.80 sec

Once you see them, you can send some test mail. Tail /var/log/zimbra.log to see if it worked or to see any error messages.

If you get these error messages:

HANGUP after 0.08 from [IP]:56518 in tests before SMTP handshake
status=deferred (Cannot start TLS: handshake failure)

It means you must also add two configuration lines to the amavis configuration file in /opt/zimbra/common/conf/

-o smtp_tls_security_level=none
-o smtp_tls_wrappermode=no

So the complete section looks like this:

smtp-amavis unix -      -       n       -       %%zimbraAmavisMaxServers%%   smtp
    -o smtp_tls_security_level=none
    -o smtp_tls_wrappermode=no
    -o smtp_data_done_timeout=1200 
    -o smtp_send_xforward_command=yes
    -o disable_dns_lookups=yes
    -o max_use=20

Once you made the changes, save the file and restart all zimbra services with zmcontrol restart

The above disables TLS security for the antivirus piece. This could cause security issues if you Zimbra configuration is distributed to multiple hosts. In my case, this is an all-in-one server, so it does not matter.

Once I made the above changes, mail flowed through my external SMTP server successfully!

OpenWRT Wireguard client

My notes on how to configure an OpenWRT device to be a wireguard client (site to site VPN)

More or less follow the instructions from

I commented out the IPv6 stuff as well as the pre-shared key. I also had already defined firewall rules so I skipped that section.

One note: make sure your WG_ADDR has the proper subnet mask (I made the mistake of making it a /32 when it needed to be a /24)

# Configuration parameters
WG_ADDR="wireguard_subnet/wireguard_subnet_mask (/24 for example)"

# Generate keys
#umask go=
#wg genkey | tee wgserver.key | wg pubkey >
#wg genkey | tee wgclient.key | wg pubkey >
#wg genpsk > wgclient.psk
# Client private key
WG_KEY="$(cat wgclient.key)"
# Pre-shared key
#WG_PSK="$(cat wgclient.psk)"
# Server public key

# Configure network
uci -q delete network.${WG_IF}
uci set network.${WG_IF}="interface"
uci set network.${WG_IF}.proto="wireguard"
uci set network.${WG_IF}.private_key="${WG_KEY}"
uci add_list network.${WG_IF}.addresses="${WG_ADDR}"
#uci add_list network.${WG_IF}.addresses="${WG_ADDR6}"
# Add VPN peers
uci -q delete network.wgserver
uci set network.wgserver="wireguard_${WG_IF}"
uci set network.wgserver.public_key="${WG_PUB}"
#uci set network.wgserver.preshared_key="${WG_PSK}"
uci set network.wgserver.endpoint_host="${WG_SERV}"
uci set network.wgserver.endpoint_port="${WG_PORT}"
uci set network.wgserver.route_allowed_ips="1"
uci set network.wgserver.persistent_keepalive="25"
uci add_list network.wgserver.allowed_ips=""
#uci add_list network.wgserver.allowed_ips="::/0"
uci commit network
/etc/init.d/network restart