I've finally managed to a) get the hardware and b) the time to fit out a server to replace the ageing pacific. This is a good thing because I think the old thing is down to one working fan and may have jammed from the heat the other week. So what does one do with three '73Gb', 10,000 RPM U320 SCSI drives? Why, build a RAID, of course.

I've been thinking about this for years, of course, since FreeBSD 4 was the stable branch. Around that time, Vinum was the only real option for a software RAID solution on the radar and so I have pretty much assumed since then that I would be using Vinum on that distant, glorious day when I have more than one disk to use in my server. Vinum from all accounts is a perfectly good piece of sofware but it is somewhat complex (which is to say I would not want to set up a new Vinum array after a night out). This is bourne out of its flexibility - you can set up pretty much any kind of RAID level or combination thereof, all on one disk if you really want(!).

So that glorious time, the coming of my personal data reliability nirvana if you will, finally arrived yesterday and lo - my thoughts turned to configuring Vinum while waiting for the successful installation of the base system, the source tree, the ports and so on. While poking around, hoping to see if there was some useful documentation dated sometime in this millenia (I'm serious, all the Vinum docs are from 1999. Must have been a good year for it) I discovered RSE's excellent article, FreeBSD System Disk Mirroring which pointed me at gmirror which, if all you want to do is set up a RAID 1 mirror, is pure ease of setup gold.

RSE's article is actually overkill if you are not booting off the array (hey, why stop living dangerously now when I'm loving it?), as is the section in the handbook. Building a mirrored array boils down to running one command:

gmirror label -v array0 da1 da2

This gives you a device called /dev/mirror/array0 once you have loaded the geom mirror kernel module. Do this at runtime using gmirror load or at next boot time by running echo 'geom_mirror_load="YES"' >> /mnt/boot/loader.conf. And that's it. You can then partition, label and format that array like any normal disk. Geom is seriously rocking my boat for this very reason. All praise PHK and he people that paid him!

Here is how I partitioned, labeled and formatted the array, as one large disk:

fdisk -vI /dev/mirror/array0
bsdlabel -w /dev/mirror/array0s1
newfs -U /dev/mirror/array0s1a

Do not forget to put an entry for the array in /etc/fstab, either.

Finding Disk Failures

So the annoying thing about running a server is maintaining it. I know I would not check every day in case a disk in the array has failed, so I'd want some mechanism to do this for me. FreeBSD sends out a daily system report email, so I'd like it to be added to that. This is easy:

echo 'daily_status_gmirror_enable="YES"' >> /etc/periodic.conf

Easy. Rock!

Recovering

So, the whole point of using a RAID 1 array is being able to deal with drive failure. I played around with this a bit to work out exactly how do do it. Again, the procedure is pretty basic and is outlined in the gmirror(4) man page.

I think the important point however is that when the dead drive is disconnected, that you do a gmirror forget array0 so that the array forgets about the failed disk. Also, do not reinsert the old disk unless it has been wiped first, especially if it was the highest priority disk in the array - the array might get rebuilt using the old disk - that's bad.

Once a replacement disk has been inserted (it will have to be at least as big as the old one, just run:

gmirror forget array0
gmirror insert array0 da2

Hot Swapping Disks

Or, the other holy grail. I also wanted to test out hot swapping disks to see how the SCA backplane and the system handled it, so I tried a camcontrol stop da2 - do not try this at home!. The drive made a really strange whiling noise for a few seconds and stopped. I stupidly did not find out if it actually spun down, mainly because that kind of response from a hard drive is an odd sort of thing. Not too diminished however, I started cat'ing /dev/random to a file on the array. The stopped drive makes that weird noise again but after, it just starts working and the file is happily written.

Soooo... I don't know if hot swapping works. I guess if I have a dead disk, I can just camcontrol it off and remove it, but I really don't want to try that stunt with a working disk for now.

Two useful camcontrol commands are: camcontrol devlist and camcontrol rescan all. That is all.