Storage XenMotion is awesome. It allows me to spin up a second Xenserver host and live migrate VMs to it whenever I need to do maintenance on my primary xenserver host. I don’t need an intermediary storage device such as a NAS – the two hosts can exchange live, running VMs directly. No downtime!
An unfortunate side effect of using Storage XenMotion is that sometimes it doesn’t clean itself up very well. It takes several snapshots in the migration process and they sometimes get “forgotten about.” This results in inexplicable low disk space errors such as this one:
The specified storage repository has insufficient space
..despite there being plenty of space.
This article explains how to use the coalesce option to reclaim space by issuing the following command:
xe host-call-plugin host-uuid=<host-UUID> plugin=coalesce-leaf fn=leaf-coalesce args:vm_uuid=<VM-UUID>
Unfortunately that didn’t seem to do anything for me. Digging into the storage underpinnings I can see that there are a lot of logical volumes hanging out there not being used:
xe vdi-list sr-uuid=<UUID of SR without space>
This revealed a lot of disks floating around in the SR that aren’t being used (I know this by looking at that same SR inside xencenter.) Curiously there is a VDI with identical names but with different UUIDs, despite my not having any snapshots of that VM.
I was about to start using the vgscan command to look for active volume groups when I got called away. Hours later, when I got back to my task, I found that all the space had been freed up. Xenserver had done its own garbage collection, albeit slowly. So, if you’ve tried to use xenmotion and found you have no space.. give xenserver some time. You might just find out that it will clean itself up.
I ran into this problem once more. I read from here that simply initiating a scan of the storage repository is all you need to do to reclaim lost space. Unfortunately when I ran that the scan nothing changed. A check of /var/log/SMlog revealed the following error (thanks to ap’s blog for the guidance)
SM:  ***** sr_scan: EXCEPTION XenAPI.Failure, ['INTERNAL_ERROR', 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", "3e616c49-adee-44cc-ae94-914df0489803")'] ... Raising exception [40, The SR scan failed [opterr=['INTERNAL_ERROR', 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", "3e616c49-adee-44cc-ae94-914df0489803")']]]
For some reason one of the ISOs in one of my SRs was throwing an error – specifically a Xenserver operating system fixup iso, which was causing the coalescing process to abort. I didn’t care if I lost that VDI so I nuked it:
xe vdi-destroy uuid="3e616c49-adee-44cc-ae94-914df0489803"
That got me a little father, but I still wasn’t seeing any free space. Further inspection of the log revealed this gem:
SMGC:  No space to leaf-coalesce f8f8b129[VHD](20.000G/10.043G/20.047G|ao) (free space: 1904214016)
I read that if there isn’t enough space, a coalesce can’t happen on a running VM. I decided to shut down one of my VMs that was hogging space and run the scan again. This time there was progress in the logs. It took a while, but eventually my space was restored!
Moral of the story: if your server isn’t automatically coalescing to free up space, check /var/log/SMlog to see what’s causing it to choke.