I came across an interesting situation where if I rebooted my Ubiquiti UniFi Switch 24 for a firmware upgrade, all my VM hosts would reboot themselves. It turned out to be due to my having enabled HA in ProxMox. The hosts would temporarily lose connectivity to each other and begin to fence themselves off from the cluster. This caused HA to kill the VMs on those hosts. Then once connectivity was restored everything would eventually come back up.
The proper way to fix this would be to have multiple paths for each host to talk to each other, so if one switch goes down the cluster is still able to communicate. In my case, where I only have one switch, the “poor man’s fix” was to simply disable HA altogether during the switch reboot, as outlined here. Then, once the switch is back up, re-enable HA.
On each node, stop the pve-ha-lrm service. Once it’s stopped on all hosts, stop the pve-ha-crm service. Then reboot your switch.
After the switch is back up, start pve-ha-lrm on each node first, then pve-ha-crm (if it doesn’t auto start itself) to re-enable HA.