It all started with an innocent enough e-mail:
Data Disk is filling up - please check. Current usage: 100%
I couldn’t find any clear information about what to do about this on Sophos’ forums. My data disk was full. What to do?
I can tell you what not to do – delete random files. I thought my solution would be to log into the UTM’s console and run a du -hsx /* to see where the space was. I found a large folder inside /var/storage – /var/storage/cores/httpd.16438. I removed it, because why not?
It turns out that did some weird things to my UTM. After removing that folder I kept getting spammed with these e-mails, once every hour:
Pop3 proxy not running - restarted
It took me a while to realize, but this also meant all e-mails relayed to the UTM were not being delivered. The entire POP/SMTP subsystem of the Sophos UTM was hosed. I could not find anything on the Sophos forums. After scratching my head I decided to have a deeper look at the logs. From the command line I issued
ls -ltr /var/log
and began reading the most recent logs.
pop3.log let me know what the problem was:
pop3proxy: Can't connect to database, retrying in 10 seconds: could not connect to server: Connection refused
I could not find any useful fixes for this error. I kept digging.
selfmon.log wasn’t much help other than to confirm that pop3 was having some serious issues. It was an endless abyss of repeated error messages:
selfmonng: W NOTIFYEVENT Name=pop3proxy_running Level=INFO Id=117 suppressed ... selfmonng: W actionCmd(+): '/var/mdw/scripts/pop3 restart'
system.log put me on the right track:
ulogd: pg1: connect: could not connect to server: No such file or directory
Finally, we’re getting somewhere! After some searching I learned that pg1 is the postgresql database Sophos uses. I found a way to rebuild from this forum post.
One simple command did the trick:
This rebuilt the postgresql database that I apparently corrupted when I removed files with reckless abandon. My e-mails work again!