Convert xenserver 6.5 to software RAID 1

I have written previously about how to convert Citrix Xenserver 6.2 to a software RAID 1. When I upgraded to Xenserver 6.5 I found I had to re-install the xenserver instance because the upgrade didn’t recognize the software RAID. When trying to follow my own guide I found that I couldn’t create the array – it gave the following error message:

mdadm: unexpected failure opening /dev/md0

It turns out 6.5 handles RAID differently. You have to manually load the RAID kernel modules before you can create arrays. I was able to get this running successfully thanks to guidance from this site, specifically comments on it by Olli.

The majority of this can simply be copy/pasted into the command window, once drive paths have been updated for your specific setup.

# Prepare /dev/sdd
sgdisk --zap-all /dev/sdd
sgdisk --mbrtogpt --clear /dev/sdd
sgdisk -R/dev/sdd /dev/sdc # Replicate partion table from /dev/sdc to /dev/sdd with unique identifier
sleep 5 # Sleep 5 seconds here if you script this…
sgdisk --typecode=1:fd00 /dev/sdd
sgdisk --typecode=2:fd00 /dev/sdd
sgdisk --typecode=3:fd00 /dev/sdd
sleep 5 # Sleep 5 seconds here if you script this…
modprobe md_mod # load raid, because it isn't load by default (XS6.5 only)
yes|mdadm --create /dev/md0 --level=1 --raid-devices=2 --metadata=0.90 /dev/sdd1 missing # Create md0 (root)
yes|mdadm --create /dev/md1 --level=1 --raid-devices=2 --metadata=0.90 /dev/sdd2 missing # Create md0 (swap)
yes|mdadm --create /dev/md2 --level=1 --raid-devices=2 --metadata=0.90 /dev/sdd3 missing # Create md0 (storage)
sleep 5 # Sleep 5 seconds here if you script this…
mkfs.ext3 /dev/md0 # Create root FS
mount /dev/md0 /mnt # Mount root FS
cp -xR --preserve=all / /mnt # Replicate root files
mdadm --detail --scan > /mnt/etc/mdadm.conf #generate RAID configuration
sed -i 's/LABEL=[a-zA-Z\-]*/\/dev\/md0/' /mnt/etc/fstab # Update fstab for new RAID device
mount --bind /dev /mnt/dev
mount -t sysfs none /mnt/sys
mount -t proc none /mnt/proc
chroot /mnt /sbin/extlinux --install /boot
dd if=/mnt/usr/share/syslinux/gptmbr.bin of=/dev/sdd
chroot /mnt
mkinitrd -v -f --theme=/usr/share/splash --without-multipath /boot/initrd-`uname -r`.img `uname -r`
exit
sed -i 's/LABEL=[a-zA-Z\-]*/\/dev\/md0/' /mnt/boot/extlinux.conf # Update extlinux for new RAID device
cd /mnt && extlinux --raid -i boot/
sgdisk /dev/sdd --attributes=1:set:2

#Unmount filesystems and reboot
cd
umount /mnt/dev
umount /mnt/sys
umount /mnt/proc
umount /mnt
sync
reboot

Tell BIOS to use disk B
After reboot to disk B…

sgdisk -R/dev/sdc /dev/sdd # Replicate partition table from /dev/sdd to /dev/sdc with unique identifier
sgdisk /dev/sdc --attributes=1:set:2
sleep 5 # Sleep 5 seconds here if you script this…
mdadm -a /dev/md0 /dev/sdc1
mdadm -a /dev/md1 /dev/sdc2
mdadm -a /dev/md2 /dev/sdc3 # If this command gives error, you need to forget/destroy an active SR first
#This next command is the only command you have to manually update before pasting in. Find the UUID of your xenserver host and paste it between the <> below
xe sr-create content-type=user device-config:device=/dev/md2 host-uuid=<UUID of xenserver host> name-label="RAID 1" shared=false type=lvm
# Watch rebuild progress and wait until no arrays are rebuilding before proceeding with any reboot
watch “mdadm --detail /dev/md* | grep rebuild”

Done!

Fix Apache Permission Denied errors

The other day I ran the rsync command to migrate files from an old webserver to a new one. What I didn’t notice right away was that the rsync changed the permissions of the folder I was copying into.

The problem presented itself with a very lovely 403 forbidden error message when trying to access any website that server hosted. Checking the logs (/var/log/apache2/error.log on my Debian system) revealed this curious message:

[error] [client 192.168.22.22] (13)Permission denied: access to / denied

This made it look like apache was denying access for some reason. I verified apache config and confirmed it shouldn’t be denying anything. After some head scratching I came across this site which explained that Apache throws that error when it encounters filesystem access denied error messages.

I was confused because /var/www, where the websites live, had the appropriate permissions. After some digging I found that the culprit in my case was not /var/www, but rather the /var directory underneath /var/www. For some reason the rsync changed /var to not have any execute permissions (necessary for folder access.)  A simple

chmod o+rx /var/

resolved my problem. Next time you get 403 it could be underlying filesystem issues and not apache at all.

Remove and re-install a Debian package

I wanted to completely blow away owncloud on one of my Debian servers today. I did apt-get remove owncloud and removed the owncloud directory. That should be all I need to do, right? Alas, not so.

It turns out that a simple apt-get remove does not remove necessary files, and a simple apt-get install doesn’t restore those missing files. In order to completely blow away the package, you must use the purge command (thanks to this site for the inspiration.)

First, purge the package as well as related packages

sudo apt-get purge owncloud-*

Then, re-install the package with the –reinstall flag

sudo apt-get install --reinstall owncloud

Done! The package has come back anew with all necessary files.

Delete windows.old folder

Some time ago I upgraded my Windows Server 2012 machine to Windows Server 2012 R2. The upgrade was seamless and the server has hummed along just fine until recently, when it began running out of space.

Windirstat, a great little disk space usage reporting program, reported that the largest hog of space was the windows.old folder. Upon upgrade of the OS, the old Windows folder was renamed to Windows.old to make room for the new OS files and has sat there, untouched, ever since.

I tried to remove this folder with hilarious results. The folder is owned by TrustedInstaller. Easy enough, I’ll just replace the owner with my own user account, right? Wrong. Even after becoming the owner of the folder and everything inside it, I was prompted that I needed permission from… myself.. to delete the folder. I then tried changing the owner to “Everyone” and receive a rother comical message that I needed permission from Everyone to remove the folder. That would take some time!

everyone
You need permission from everyone.

That’s when I decided to throw in the towel and google. The solution to this problem involves the command line (thanks to here for the information.) Open an administrator command prompt and issue the following commands:

takeown /F c:\Windows.old\* /R /A /D Y
cacls c:\Windows.old\*.* /T /grant administrators:F
rmdir /S /Q c:\Windows.old

That did the trick! No more full disk.

Reclaim lost space in Xenserver 6.5

Storage XenMotion is awesome. It allows me to spin up a second Xenserver host and live migrate VMs to it whenever I need to do maintenance on my primary xenserver host. I don’t need an intermediary storage device such as a NAS – the two hosts can exchange live, running VMs directly. No downtime!

An unfortunate side effect of using Storage XenMotion is that sometimes it doesn’t clean itself up very well. It takes several snapshots in the migration process and they sometimes get “forgotten about.” This results in inexplicable low disk space errors such as this one:

The specified storage repository has insufficient space

..despite there being plenty of space.

This article explains how to use the coalesce option to reclaim space by issuing the following command:

xe host-call-plugin host-uuid=<host-UUID> plugin=coalesce-leaf fn=leaf-coalesce args:vm_uuid=<VM-UUID>

Unfortunately that didn’t seem to do anything for me. Digging into the storage underpinnings I can see that there are a lot of logical volumes hanging out there not being used:

xe vdi-list sr-uuid=<UUID of SR without space>

This revealed a lot of disks floating around in the SR that aren’t being used (I know this by looking at that same SR inside xencenter.) Curiously there is a VDI with identical names but with different UUIDs, despite my not having any snapshots of that VM.

I was about to start using the vgscan command to look for active volume groups when I got called away. Hours later, when I got back to my task, I found that all the space had been freed up. Xenserver had done its own garbage collection, albeit slowly. So, if you’ve tried to use xenmotion and found you have no space.. give xenserver some time. You might just find out that it will clean itself up.


Update 05/20/2015

I ran into this problem once more. I read from here that simply initiating a scan of the storage repository is all you need to do to reclaim lost space. Unfortunately when I ran that the scan nothing changed. A check of /var/log/SMlog revealed the following error (thanks to ap’s blog for the guidance)

SM: [30364] ***** sr_scan: EXCEPTION XenAPI.Failure, ['INTERNAL_ERROR', 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", "3e616c49-adee-44cc-ae94-914df0489803")']
...
Raising exception [40, The SR scan failed  [opterr=['INTERNAL_ERROR', 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", "3e616c49-adee-44cc-ae94-914df0489803")']]]

For some reason one of the ISOs in one of my SRs was throwing an error – specifically a Xenserver operating system fixup iso, which was causing the coalescing process to abort. I didn’t care if I lost that VDI so I nuked it:

xe vdi-destroy uuid="3e616c49-adee-44cc-ae94-914df0489803"

That got me a little father, but I still wasn’t seeing any free space. Further inspection of the log revealed this gem:

SMGC: [7088] No space to leaf-coalesce f8f8b129[VHD](20.000G/10.043G/20.047G|ao) (free
 space: 1904214016)

I read that if there isn’t enough space, a coalesce can’t happen on a running VM. I decided to shut down one of my VMs that was hogging space and run the scan again. This time there was progress in the logs. It took a while, but eventually my space was restored!

Moral of the story: if your server isn’t automatically coalescing to free up space, check /var/log/SMlog to see what’s causing it to choke.

Automatically delete old data in Splunk

I’ve had Splunk humming along for about two years now. I’ve already increased the storage space for my Splunk VM once. Today I received a notice that I’ve once again run out of space and indexing had been suspended. I wanted a more permanent solution to this problem, so I consulted the almighty Google.

The solution to my problem is to set a retirement policy. This allows Splunk to automatically delete old data when you hit a certain index size. You can also go by time, but I opted for size. It’s a pretty simple configuration change. Simply edit (or create if it doesn’t exist) $SPLUNK_HOME/etc/system/local/index.conf and add two lines

sudo vim /opt/splunk/etc/system/local/indexes.conf
[main]
maxTotalDataSizeMB = 50000

Then, restart your Splunk instance (the command below is Debian/Ubuntu specific)

sudo service splunk restart

The configuration above tells Splunk to keep at most 50GB of data. When that limit is reached, it begins deleting the oldest log files. No more out of space errors!

Install ventrilo on Ubuntu 14.04 64bit

Ventrilo is a voice communication server which is popular in the gaming community. It allows teams of people to get together and have voice chats. I recently tried to install vent on a 64bit instance of Ubuntu 14.04. When I tried to execute the server binary, I was greeted with this lovely error message:

bash: ./ventrilo_srv: No such file or directory

It’s a pretty cryptic error message that had me chasing my tail for a while until I came across this post which shed further light on the issue. This error stems from trying to run a 32bit binary on a 64bit system without the proper libraries installed.

A simple

sudo apt-get install lib32z1

Resolved this issue. After those 32bit libraries were installed, vent ran without issue.

Using screen to run interactive programs at startup

Oftentimes I will encounter programs that weren’t necessarily designed to be automatically run that I want to run on startup. Sometimes that program will have interactive information that you will want to see later, but you still want it to run on startup.

The solution to this particular problem is using screen in combination with su and bash. In my situation, I want to run the HDSurfer plugin on bootup as a different user. The solution I came up is as follows (thanks to superuser.com and stackoverflow.com for the guidance I needed to set this up.)

Install screen

Screen is like having a separate X window session to keep a program running, except it is for console programs. You can attach and detach to this screen whenever you’d like and not worry about the program terminating.

sudo apt-get install screen

Create a script to run your program with all required arguments

In my case I needed to execute the command “python /usr/bin/HDSurferWave/hdsurferwave.py start” as a different user in a screen session (so it wouldn’t terminate when the terminal session did.) To do this,

  • invoke screen with the -dm command (to begin the program in detached mode)
  • issue the bash -c argument afterward to invoke bash
  • Include your desired command after that

My one line script looks like this:

screen -dm bash -c "python /usr/bin/HDSurferWave/hdsurferwave.py start"

Run your script

I use the su command with the -c argument to change the user that will be running the script, as the startup script launches things as root by default (with pre-systemd systems, anyway.) The -s command initiates a shell to launch, and the last argument is the user you want to run as. My launch argument is:

su -c "/usr/bin/HDSurferWave/start.sh" -s /bin/sh nicholas

Configure the script to run on startup

Edit /etc/rc.local and add your script command from above, then mark that file as executable by running chmod +x /etc/rc.local. Note: This will not work with systems using systemd.

 

 

Disable access logging in Tomcat 7

Guacamole is a great HTML5 VPN gateway. It allows me to access internal applications without having to install any software. I wrote about it briefly in this article.  It wasn’t until I noticed that my Splunk indexer reported warnings that I had exceeded my 500MB quota (the free license maximum amount) that I realized that guacamole has a verbosity problem.

In examining the logs it appears that Guacamole passes about 6 HTTP requests per second while you’re using it. This problem is magnified if you have guacamole sitting behind an apache server, as each request is logged twice – once in Apache access logs, and again in Tomcat access logs.

Since I already have that same information in apache access logs and I don’t allow access directly to Tomcat, I set out to disable Tomcat logging completely. Things have changed between versions so it got a little confusing.

To disable logging in Tomcat 7, you have to edit /etc/tomcat7/server.xml (that’s where it lives in Ubuntu Server 14.04 anyway) and comment out a section (thanks to Stack Overflow for helping me figure this out.)

vim /etc/tomcat7/server.xml

Find this line:

    <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"  
           prefix="localhost_access_log." suffix=".txt"
           pattern="%h %l %u %t &quot;%r&quot; %s %b" resolveHosts="false"/>

Comment out the line like this:

    <!-- <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"  
           prefix="localhost_access_log." suffix=".txt"
           pattern="%h %l %u %t &quot;%r&quot; %s %b" resolveHosts="false"/> -->

Save the file and restart Tomcat.

:wq
service tomcat7 restart

No more duplicate logging.

Re-enable Windows Defender in Windows 8.1

I recently underwent an exercise in frustration – taking a stock Lenovo G50 laptop and trying to get it into a workable state. Like many lower end laptops it came with a lot of bloatware, not the least of which was Mcafee Antivirus-premium-whatever garbage.

When I removed Mcafee and rebooted, I got a notice that Windows Defender was not enabled. When I clicked to enable it… nothing happened. I tried several times to no avail. I tried manually starting the service and it failed. I tried hard core things like running sfc /scannow and even editing the registry per here, but the issue remained.

Finally I came across this article which showed me how to fix it – simply launch windows defender by hitting start and searching for Windows Defender. Click the big red “Turn On” button. That’s all it takes! Why you can’t do that through the action center or by manually starting services is beyond me. Amazing.

Note: it’s possible that the combination of editing the registry, rebooting, and then launching the Windows Defender program fixed the issue. Whatever – it works now 🙂