Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Sep 1999 08:06:44 -0700 (PDT)
From:      Daniel Harms <dharms98@yahoo.com>
To:        freebsd-questions@freebsd.org
Subject:   Vinum recovery procedure
Message-ID:  <19990908150644.5808.rocketmail@web116.yahoomail.com>

next in thread | raw e-mail | index | archive | help

I posted this on dejanews on sol.lists.freebsd.questions, but I don't
think it gets to the list. Sorry if you're seeing this twice.



I am trying to set up a very reliable server in a very remoten
location, and I thought vinum would be the perfect tool to provide
resiliance at not too much cost. I'm having some problems in my testing
it, however: 
                   
I have a server (compaq 1850r) with two host-swappable drives. I am
trying to simulate a drive failure and figure out how I would  recover
from it. I couldn't find too much documenation on how to do this, so
I'm making some guesses in the process.
                   
I am running 3.2-RELEASE as it comes on the walnut creek cdrom.
                   
Here are all the steps I take:
                   
My vinum config looks like this:
                   
drive d1 device /dev/da0g
drive d2 device /dev/da1e
volume tezzt
  plex org concat
    sd length 0 drive d1
  plex org concat
    sd length 0 drive d2
                   
Both /dev/da0g and /dev/da1e are 200MB partitions pe "vinum".
                   
1. vinum create -f /etc/vinum.conf
                   
This is where the first strange thing comes in - tezzt.p1.s0 state is
"initializing", but I see no disk activity. Having lost  patience
waiting, I do:
                   
2. vinum stop tezzt.p1 ; init tezzt.p1
... lots of disk activity on the 2nd drive this time...
... tezzt.p1.s0 is up
... tezzt.p1 is up.
                   
Good!
                   
3. newfs -v /dev/vinum/tezzt ... OK, 199.9MB
                   
4. mount /dev/vinum/tezzt /mnt
                   
5. Now I run a script that creates some I/O activity on it, from here
on referred to as the "io script":
                   
while /usr/bin/true; do
  cp -R /usr/src/release/picobsd
  cat `find /mnt` > /dev/null
done
                   
... I see lots of I/O on both drives

5. Now I YANK THE SECOND DRIVE while this is going on!!!
... the system seems to pause for a few seconds
... kernel spits out a bunch of messages
    ..invalidating pack..
    ..fatal write error.. etc...
... the systems comes back to life and I can see the drive going, io
script still running. 

Excellent!!! (so far)
                   
"vinum lv -r" reports
  tezzt.p1 faulty
  tezzt.p1.s0 stale

At this point in a real world situation I would presumably notice the
failure. From here on I'm guessing I should do this:
                   
6. vinum stop tezzt.p1
                   
7. reboot, reinsert the drive
                   
8. "vinum lv -r" reports tezzt.p1 is "down"
                   
The goal is to mirror all my partitions, including root (I have an MFS
root whith symlinks to "root" which I'll mirror). So if this were for
real, that filesystem would be mounted before I get a chance to fix it.
To simulate this, I mount it first, then see how I'm going to fix it. 
                   
9. fsck /dev/vinum/rtezzt  ...OK
   mount /dev/vinum/tezzt /mnt
   ls -l /mnt    ...files are there, good.
                   
10. I start my io script again. Lots of disk activity on the fisrt
drive. (The second one is down.)
                   
11. vinum init tezzt.p1
... initializing
... initialized
                  
This is where things get *really* strange....
                   
... tezzt.p1.s0 is reviving (<- makes sense)
... tezzt.p1 is faulty (<- understandable)
                   
BUT I see *NO* I/O on the second drive.
                   
"vinum ls -v" shows tezzt.p1.s0 
     revive pointer: 64KB (0%)
                   
Loosing patience waiting again, resorting to crapshoots:
                   
12. kill the io script
13. vinum stop tezzt.p1 ; vinum init tezzt.p1
... initializing
... is up
                   
14. umount /mnt
    mount /mnt
    ls /mnt: Bad file descriptor  (??????????)
    
trying again...

15. umount /mnt
    mount /mnt
    ls /mnt/*
                   
syncing disks..... ***total system crash***
                   
                   
What did I do wrong? What am I supposed to do to recover from a drive
failure? Could the fact that /mnt lives on an MFS root be part of the
problem?
                   
Thanks!
                   
Dan
                   
__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990908150644.5808.rocketmail>