Tag Archives: RAID

WD*EZRZ NAS array spindown fix

I recently acquired some 5TB Western Digital Blue drives (WD50EZRZ.) These particular drives were shucked from external USB enclosures. When I tried to add them into my ZFS raid array, though, I ran into constant problems. I would continually get errors like this from the kernel:

[155069.298001] sd 0:0:10:0: attempting task abort! scmd(ffff8f0678887100)
[155069.298005] sd 0:0:10:0: [sdk] tag#5 CDB: Write(16) 8a 00 00 00 00 01 a8 1e 77 10 00 00 00 58 00 00
[155069.298008] scsi target0:0:10: handle(0x0014), sas_address(0x5001438023a93296), phy(22)
[155069.298010] scsi target0:0:10: enclosure logical id(0x5001438023a932a5), slot(53) 
[155069.298012] sd 0:0:10:0: task abort: SUCCESS scmd(ffff8f0678887100)
[155069.298016] sd 0:0:10:0: [sdk] tag#5 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[155069.298018] sd 0:0:10:0: [sdk] tag#5 CDB: Write(16) 8a 00 00 00 00 01 a8 1e 77 10 00 00 00 58 00 00
[155069.298020] blk_update_request: I/O error, dev sdk, sector 7115536144
[155069.298023] zio pool=storage vdev=/dev/disk/by-id/ata-WDC_WD50EZRZ-32RWYB1_WD-WX31XXXXXVA-part1 error=5 type=2 offset=3643153457152 size=45056 flags=180880

After a couple of said errors, the drive would be marked as bad and taken out of the array. A battery of tests on a different system revealed the drives to be fine. It did not matter where I inserted these drives on my NAS, they did the same thing, even on ports I knew had working drives. It wasn’t a cabling or other hardware issue.

The drives would resilver back into the array just fine, and then pop out again at random intervals – sometimes minutes later, other times hours later. After a lot of research I came across this post that got me thinking – this sounds like a drive spindown issue! The random nature of it could simply be the drives not being used and then powering themselves down.

I tried using hdparm to set the spindown timer but was greeted with this lovely error:

sudo hdparm -B /dev/sdk
/dev/sdk:
 APM_level	= not supported

I eventually found this post complaining about their Western Digital drives spinning down aggressively.

idle3 to the rescue

The above post mentions apmtimer which did not help me, however more searches reveled this godsend: idle3-tools

idle3-tools is an open source utility to handle spindown on Western Digital drives themselves (not the OS level.)

Download & compile idle3:

wget https://sourceforge.net/projects/idle3-tools/files/latest/download
cd idle3-tools-0.9.1/
make
sudo make install

Use idle3 to query current spindown status (update drive letters to suit your needs)

for drive in {a..p}; do echo /dev/sd$drive; sudo idle3ctl -g /dev/sd$drive; done

For anything that doesn’t say Idle3 timer is disabled run the following:

sudo idle3ctl -s 0 /dev/sd(DRIVE_LETTER)

No more drive spindown!

Convert xenserver 6.5 to software RAID 1

I have written previously about how to convert Citrix Xenserver 6.2 to a software RAID 1. When I upgraded to Xenserver 6.5 I found I had to re-install the xenserver instance because the upgrade didn’t recognize the software RAID. When trying to follow my own guide I found that I couldn’t create the array – it gave the following error message:

mdadm: unexpected failure opening /dev/md0

It turns out 6.5 handles RAID differently. You have to manually load the RAID kernel modules before you can create arrays. I was able to get this running successfully thanks to guidance from this site, specifically comments on it by Olli.

The majority of this can simply be copy/pasted into the command window, once drive paths have been updated for your specific setup.

# Prepare /dev/sdd
sgdisk --zap-all /dev/sdd
sgdisk --mbrtogpt --clear /dev/sdd
sgdisk -R/dev/sdd /dev/sdc # Replicate partion table from /dev/sdc to /dev/sdd with unique identifier
sleep 5 # Sleep 5 seconds here if you script this…
sgdisk --typecode=1:fd00 /dev/sdd
sgdisk --typecode=2:fd00 /dev/sdd
sgdisk --typecode=3:fd00 /dev/sdd
sleep 5 # Sleep 5 seconds here if you script this…
modprobe md_mod # load raid, because it isn't load by default (XS6.5 only)
yes|mdadm --create /dev/md0 --level=1 --raid-devices=2 --metadata=0.90 /dev/sdd1 missing # Create md0 (root)
yes|mdadm --create /dev/md1 --level=1 --raid-devices=2 --metadata=0.90 /dev/sdd2 missing # Create md0 (swap)
yes|mdadm --create /dev/md2 --level=1 --raid-devices=2 --metadata=0.90 /dev/sdd3 missing # Create md0 (storage)
sleep 5 # Sleep 5 seconds here if you script this…
mkfs.ext3 /dev/md0 # Create root FS
mount /dev/md0 /mnt # Mount root FS
cp -xR --preserve=all / /mnt # Replicate root files
mdadm --detail --scan > /mnt/etc/mdadm.conf #generate RAID configuration
sed -i 's/LABEL=[a-zA-Z\-]*/\/dev\/md0/' /mnt/etc/fstab # Update fstab for new RAID device
mount --bind /dev /mnt/dev
mount -t sysfs none /mnt/sys
mount -t proc none /mnt/proc
chroot /mnt /sbin/extlinux --install /boot
dd if=/mnt/usr/share/syslinux/gptmbr.bin of=/dev/sdd
chroot /mnt
mkinitrd -v -f --theme=/usr/share/splash --without-multipath /boot/initrd-`uname -r`.img `uname -r`
exit
sed -i 's/LABEL=[a-zA-Z\-]*/\/dev\/md0/' /mnt/boot/extlinux.conf # Update extlinux for new RAID device
cd /mnt && extlinux --raid -i boot/
sgdisk /dev/sdd --attributes=1:set:2

#Unmount filesystems and reboot
cd
umount /mnt/dev
umount /mnt/sys
umount /mnt/proc
umount /mnt
sync
reboot

Tell BIOS to use disk B
After reboot to disk B…

sgdisk -R/dev/sdc /dev/sdd # Replicate partition table from /dev/sdd to /dev/sdc with unique identifier
sgdisk /dev/sdc --attributes=1:set:2
sleep 5 # Sleep 5 seconds here if you script this…
mdadm -a /dev/md0 /dev/sdc1
mdadm -a /dev/md1 /dev/sdc2
mdadm -a /dev/md2 /dev/sdc3 # If this command gives error, you need to forget/destroy an active SR first
#This next command is the only command you have to manually update before pasting in. Find the UUID of your xenserver host and paste it between the <> below
xe sr-create content-type=user device-config:device=/dev/md2 host-uuid=<UUID of xenserver host> name-label="RAID 1" shared=false type=lvm
# Watch rebuild progress and wait until no arrays are rebuilding before proceeding with any reboot
watch “mdadm --detail /dev/md* | grep rebuild”

Done!

Convert xenserver installation to software RAID-1

Update 2/28/2015:  I have a newer article explaining how to do this in Xenserver 6.5.


 

After having a hard drive nearly die on me and threaten to obliterate the VMs living on it I realized it would be a good idea to have my xenserver installation live on a RAID array.

Following this guide I was able to successfully migrate my running xenserver installation to a software based RAID 1, with a few tweaks. In my case I wanted to migrate from a single old drive to two newer ones.

Below are the steps I took to accomplish this.

Partition the new drives

This assumes that your current drive resides on /dev/sda, and your two new drives are /dev/sdb and /dev/sdc.

sgdisk -p /dev/sda
sgdisk --zap-all /dev/sdb
sgdisk --zap-all /dev/sdc
sgdisk --mbrtogpt --clear /dev/sdb
sgdisk --mbrtogpt --clear /dev/sdc
sgdisk --new=1:34:8388641 /dev/sdb
sgdisk --new=1:34:8388641 /dev/sdc
sgdisk --typecode=1:fd00 /dev/sdb
sgdisk --typecode=1:fd00 /dev/sdc
sgdisk --attributes=1:set:2 /dev/sdb
sgdisk --attributes=1:set:2 /dev/sdc
sgdisk --new=2:8388642:16777249 /dev/sdb
sgdisk --new=2:8388642:16777249 /dev/sdc
sgdisk --typecode=2:fd00 /dev/sdb
sgdisk --typecode=2:fd00 /dev/sdc

The third partition (VM storage) had to be tweaked a bit since these are larger drives than the current xenserver installation. I simply used gdisk instead of sgdisk for this task.

gdisk /dev/sdb
n #create new partition
<enter> #accept defaults for partition number, first, and last sectors
<enter>
<enter>
t #select partition type
3 #select partition number 3
fd00  #set for raid
w   #write changes to disk

Repeat above steps for the other disk (/dev/sdc in my case)

Create the RAID arrays for each partition

mdadm --create /dev/md0 --level=1 --raid-devices=2  /dev/sdb1 /dev/sdc1
mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sdb2 /dev/sdc2
mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/sdb3 /dev/sdc3

Watch array build (optional)

cat /proc/mdstat

Alternatively you can use the watch command to get a real time update of the raid build:

watch -n 1 cat /proc/mdstat

Format & mount the array

mkfs.ext3 /dev/md0
mount /dev/md0 /mnt

Copy the root filesystem to the new array

cp -vxpr / /mnt

Install bootloader on the new disks

mount --bind /dev /mnt/dev
mount -t sysfs none /mnt/sys
mount -t proc none /mnt/proc
chroot /mnt /sbin/extlinux --install /boot
dd if=/mnt/usr/share/syslinux/gptmbr.bin of=/dev/sdb
dd if=/mnt/usr/share/syslinux/gptmbr.bin of=/dev/sdc

Generate new initrd image

chroot /mnt
mkinitrd -v -f --theme=/usr/share/splash --without-multipath /boot/initrd-`uname -r`.img `uname -r`
exit

Modify boot file

Edit /mnt/boot/extlinux.conf and replace every mention of the old root filesystem (root=LABEL=xxx) with root=/dev/md0.

vi /mnt/boot/extlinux.conf
:%s/LABEL=<root label>/\/dev\/md0/
:wq

Reboot

Keep the old drive in, but make sure to boot from either one of the member drives of your new array.

Create storage repository

Create new local storage repository with the new RAID array similar to here.

xe sr-create content-type=user device-config:device=/dev/md2 host-uuid=<UUID of xenserver host> name-label="RAID-1" shared=false type=lvm

Migrate VMs / disks

Migrate any disk images or VMs living on the old drive to the new array.

If these VMs / disks are not powered on or being used, it is as simple as pulling up xencenter, right clicking on the VM and clicking move then  select new storage repository.

If the VMs are online you can live migrate them to a different xenserver, then live migrate them back to the proper storage repository.

Remove old storage repository

Following instructions found here.
Note: In my case the transfer returned a strange error but was still successful. I had to restart the XAPI toolstack in order for it to let me remove the old storage repository.

xe sr-list name-label="<name of SR to remove>"
xe pbd-list sr-uuid=<UUID of SR above>
xe pbd-unplug uuid=<UUID of pbd above>
xe sr-forget uuid=<UUID of SR>

Final reboot

Shutdown, disconnect the old drive, and boot back up from the new array. Success.

Configure e-mail alerts (optional)

Now that you have a working RAID array you might want to receive e-mail alerts if there are problems with the array.

First, build an mdadm.conf

mdadm --detail --scan > /etc/mdadm.conf

Modify mdadm.conf to add your desired e-mail address for notifications

sed -i '1i MAILADDR <e-mail address>' /etc/mdadm.conf

Thanks to this site for the sed -i 1i trick.

Lastly, enable the mdadm monitoring service. I found via this site that this is fairly easy to do.  Simply enter these two commands:

service mdmonitor start
chkconfig mdmonitor on

Xenserver uses ssmtp to send e-mail. You can follow this guide on how to set it up for SSL if you happen to have an ISP that blocks port 25 (as I do.) Otherwise modify /etc/ssmtp/ssmtp.conf to suit your needs.

You can generate a test event from mdadm to make sure e-mail is configured properly:

mdadm --monitor --test /dev/md0 --oneshot

To get e-mail alerts to work right I had to ensure that FromLineOverride was NOT set to yes (default). I also had to add this line to /etc/ssmtp/revaliases:

root:<e-mail address being sent from>


Update 02/03/2015:  A commenter made me realize I forgot a step – copying the Control Domain OS to the new Raid array. I’ve added that step above, after the “Format & Mount the array” section.

Update 02/17/2015: If you are using Xenserver 6.5 you might come across the following error message when trying to create RAID arrays:

mdadm: unexpected failure opening /dev/md0

If this happens, load the md kernel driver like so:

modprobe md

It should then let you create your arrays.

FreeNAS on Xenserver with PVHVM support

In my current home setup I have a single server performing many functions thanks to Citrix Xenserver 6.2 and PCI Passthrough. This single box is my firewall, webservers, and NAS. My primary motivation for this is power savings – I didn’t want to have more than one box up 24/7 but still wanted all those separate services, some of which are software appliances that aren’t very customizable.

My current NAS setup is a simple Debian Wheezy virtual machine with the on-board SATA controller from the motherboard passed through to it. The VM runs a six drive software RAID 6 using mdadm and LVM volume management on top of it. Lately, though, I have become concerned with data integrity and my use of commodity drives. It prompted me to investigate ZFS as a replacement for my current setup. ZFS has many features, but the one I’m most interested in is its ability to detect and correct any and all corrupted files / blocks. This will put my mind at ease when it comes to the thousands of files that I have which are accessed infrequently.

I decided to try out FreeNAS, a NAS appliance which utilizes ZFS. After searching on forums it quickly became clear that the people at FreeNAS are not too keen on virtualizing their software. There is very little help to be had there in getting it to work in virtual environments. In the case of Xenserver, FreeNAS does work out of the box but it is considerably slower than bare metal due to its lack of support of Xen HVM drivers.

Fortunately, a friendly FreeNAS user posted a link to his blog outlining how he compiled FreeNAS to work with Xen. Since Xenserver uses Xen (it’s in the name, after all) I was able to use his re-compiled ISO (I was too lazy to compile my own) to test in Xenserver.

There are some bugs to get around to get this to work, though. Wired dad’s xenified FreeNAS doesn’t appear to like to boot in Xenserver, at least out of the box. It begins to boot but then hangs indefinitely on the following error:

run_interrupt_drive_hooks: still waiting after 60 seconds for xenbusb_nop_confighook_cb

This is the result of a bug in the version of qemu Xenserver uses. The bug causes BSD kernels to really not like the DVD virtual device in the VM and refuse to boot. The solution is to remove the virtual DVD drive. How, then, do you install FreeNAS without a DVD drive?

It turns out that all the FreeNAS installer does is extract an image file to your target drive. That file is an .xz file inside the ISO. To get wired dad’s FreeNAS Xen image to work in Xenserver, one must extract that .xz file from the ISO, expand it to an .img file, and then apply that .img file to the Xenserver virtual machine’s hard disk. The following commands can be run on the Xenserver host machine to accomplish this.

  1. Create a virtual machine with a 2GB hard drive.
  2. Mount the FreeNAS-xen ISO in loopback mode to get at the necessary file
    mkdir temp
    mount -o loop FreeNAS-9.2.1.5-RELEASE-xen-x64.iso temp/
  3. Extract the IMG file from the freeNAS ISO
    xzcat ~/temp/FreeNAS-x64.img.xz | dd of=FreeNAS_x64.img bs=64k
    

    Note that the IMG file is 2GB in size, which is larger than can sit in the root drive of a default install Xenserver. Make sure you extract this file somewhere that has enough space.

  4. Import that IMG file into the virtual disk you created with your VM in step 1.
    cd ..
    xe vdi-import uuid=<UUID of the 2GB disk created in step 1> filename=FreeNAS_x64.img
    

    This results in an error:

    The server failed to handle your request, due to an internal error.  The given message may give details useful for debugging the problem.
    message: Caught exception: VDI_IO_ERROR: [ Device I/O errors ]
    

    This error can be safely ignored – it did indeed copy the necessary files.
    Note: To obtain the UUID of the 2GB disk you created in step 1, run the “xe vdi-list” command and look for the name of the disk.

  5. Remove the DVD drive from the virtual machine. From Xencenter:
    Shutdown the VM
    Mount xs-toos.iso
    Run this command in a command prompt:

    xe vm-cd-remove uuid=<UUID of VM> cd-name=xs-tools.iso
  6. Profit!

There is one aspect I haven’t gotten to work yet, and that is Xenserver Tools integration. The important bit – paravirtualized networking – has been achieved so once I get more time I will investigate xenserver tools further.

Flashing updates to HP Proliant DL380 G5

A little while ago I bought an old HP Proliant DL380 G5 from ksl classifieds. I have used it off and on as a backup server but noticed that the drive performance was pretty abysmal. In an effort to fix this I decided to try and upgrade the ROM on the RAID controller it came with – an HP Smartarray P400 SAS/SATA controller.

It turned out to be more difficult than I expected. I first tried booting into Ubuntu server per this guide but I ran into problems with getting it to work. I tried a 32bit version of Ubuntu but I couldn’t even get that version to boot – maybe because of the 6GB of  RAM this unit has.

In experimenting with this I learned a little bit about Hp iLo syntax. This server comes with HP iLo 2, which has a web as well as an ssh interface. I encountered a need to hard reset the server (it was locked up) but the web admin “power off” button did nothing. I had to ssh into the ILO IP address and issue the following command:

power reset

I eventually abandoned my Ubuntu attempts and went with Arch Linux. Its live CD worked like a charm the first time – no fuss. I simply loaded the live CD, copied the update package to my current directory, marked it executable, and ran it.

scp nas:/storage/CP017698.scexe .
chmod +x CP017698.scexe
./CP017698.scexe

Capture

It took about five minutes.

I then set to flash the BIOS, which hadn’t been updated since 2007. It was easier than the RAID array because HP created .ISO images for this task. I obtained the BIOS from here. It was a Windows executable for some reason. The EXE extracted the various image files and had a handy how-to guide.

I took the iLO network CD installation route. I brought up the virtual media manager, loaded the ISO provided, and booted the machine. It brought up a simple flashing screen which updated the BIOS in about 5 minutes.

Capture

My proliant is feeling very hip and up to date now.