Date: Tue, 30 Nov 2004 05:35:15 -0500 (EST) From: "Brian Szymanski" <ski@indymedia.org> To: freebsd-stable@freebsd.org Subject: Re: vinum raid 5 Message-ID: <2958.10.0.0.26.1101810915.squirrel@10.0.0.26>
next in thread | raw e-mail | index | archive | help
>>>>>> "Brian" == Brian Szymanski <ski@indymedia.org> writes: > > Brian> Actually I experienced a number of bugs with vinum in 5.3 that > Brian> proved fatal to a root vinum install (in fact, everything on > Brian> the second ATA channel was marked down after every reboot if I > Brian> recall correctly). > > I had some problems with a vinum machine (still back at 5.1) > recently. I removed a pair of drives and the remaining setup wouldn't > come back up. Turns out that multiple incantations of 'vinum > setdaemon 0' and 'vinum saveconfig' finally fixed things. > > vinum is fragile. Hmm, the vinum setdaemon 0 ; vinum saveconfig ; trick didn't work for me. If anyone else finds themself in a similar situation and feels like living dangerously, I've implemented a solution with the below shell script, to be called from /etc/rc.local. Use at your own risk. First more on the symptom: on reboot, vinum looses all config information. You can load the vinum.cfg again, but this puts every subdisk in the Raid5 in the empty state (when in fact they should usually be in the up state). You can manually set these subdisks to be up, and find all your information as it was, a nice "feature". The solution: write a crappy little shell script to remember vinum's Raid5 config information, and set its state accordingly on boot before mounting/fscking, etc. If anyone uses this, I'd be interested to hear their results. It works for me with drive power pulls. Right now it depends on state being recorded at boot time, so if vinum thinks a subdisk is down (and thus out of date), then you reboot and the drive comes back to life for a moment, you're in deep doo-doo. That is to say: flaky drives may introduce errors onto the R5, but a drive that fails once and then stays down forever will be fine with this script (this is the usual case from what I've seen, but I haven't seen everything)... Also I'm not sure why bgfsck doesn't catch the filesystem automatically, so I used atd to schedule a job to run at +7 minutes with priority 10 to do the background fsck. The idea here being that if bgfsck does for some reason catch your Raid5, this fsck should lag behind and not cause any race condition problems. A future improvement would be to call the state saving portion of the script in rc.shutdown and periodically from cron. There are many others, but it "works for me" (tm). Of course the right solution is to fix {g}vinum so this hideousness isn't necessary... Cheers, Brian Szymanski ski@indymedia.org step 1: in /etc/rc.local, add: sh /etc/rc.vinum.raid5 step 2: create /etc/rc.vinum.raid5: #!/bin/sh ### configuration # ie, mount /dev/vinum/$VINUM_NAME $VINUM_MOUNTPT VINUM_MOUNTPT="/home" VINUM_NAME="big" # location of the configuration file with just the Raid5 info VINUM_CFG="/etc/vinum.big.cfg" # email address to send notifications to ADMIN_EMAIL="ski@wjb" # place to store state DEGRADE_FILE="/vinum.degraded" # if n=# subdisks, SUBDISKS=0..n-1, e.g. the below works for a 4 drive raid5 SUBDISKS="0 1 2 3" ### no config below this point #if already mounted, abort now mount | grep -q "^/dev/vinum/$VINUM_NAME on $VINUM_MOUNTPT" && exit 0 # mount R5 device # (why on earth doesn't vinum info persist across reboot?!?) /sbin/vinum create -f "$VINUM_CFG" if [ -e "$DEGRADE_FILE" ] ; then # start subdisks based on content of DEGRADE_FILE for i in $SUBDISKS ; do #start all non-degraded subdisks grep -q $VINUM_NAME.p0.s${i} "$DEGRADE_FILE" && \ echo $VINUM_NAME.p0.s$i down || \ vinum start $VINUM_NAME.p0.s${i} done #this maintains degraded state even if new drive is inserted #this is important or else we will lose major information else /sbin/vinum start $VINUM_NAME.p0 fi #let vinum catch up... sleep 1 #check the state, mail $ADMIN_EMAIL if things are horked... state=`/sbin/vinum l | grep "^P.$VINUM_NAME\.p0" | awk '{ print $5 }'` echo state: $state if [ "z$state" = zup ] ; then # mount /dev/vinum/$VINUM_NAME "$VINUM_MOUNTPT" #fsck and mount fsck -F -p "/dev/vinum/$VINUM_NAME" mount "/dev/vinum/$VINUM_NAME" "$VINUM_MOUNTPT" #bgfsck should do this automagically, but it doesn't? at "+7 minutes" <<END nice -n 10 fsck -B -p "/dev/vinum/$VINUM_NAME" END elif [ "z$state" = zdegraded ] ; then echo "VINUM DEGRADED!!! check $ADMIN_EMAIL's email..." mail -s "VINUM DEGRADED!!!" "$ADMIN_EMAIL" <<END vinum is in degraded state! buy a new drive and replace the faulty one! END #let the mail go out, something later in startup horks things... sleep 15 /sbin/vinum l | grep "^S.$VINUM_NAME\.p0\.s" | \ grep -v 'State:.up' >>"$DEGRADE_FILE" #fsck and mount fsck -F -p "/dev/vinum/$VINUM_NAME" mount "/dev/vinum/$VINUM_NAME" "$VINUM_MOUNTPT" #bgfsck should do this automagically, but it doesn't? at "+7 minutes" <<END nice -n 10 fsck -B -p "/dev/vinum/$VINUM_NAME" END else echo "VINUM BROKEN!!! check $ADMIN_EMAIL's email..." mail -s "VINUM BROKEN!!!" "$ADMIN_EMAIL" <<END vinum is in an unknown state: $state. look at what's going on! END #let the mail go out, something later in startup horks things... sleep 15 fi echo done.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2958.10.0.0.26.1101810915.squirrel>