Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Jul 2002 18:59:43 +0200
From:      beaker@iavision.com (Krzysztof =?iso-8859-2?q?J=EAdruczyk?=)
To:        freebsd-stable@freebsd.org
Subject:   Vinum - replacing a subdisk in raid 5 config
Message-ID:  <uk7nng0lc.fsf@iavision.com>

next in thread | raw e-mail | index | archive | help
Hi,

In the company I work for we use 4*80 GB IDE configured as Vinum raid5
volume. It has worked flawlessly for almost a year, but today one of
the disks has started making trouble. Having read some recent posts
about problems with recovering volumes - I'm pretty scared right
now. The http://www.vinumvm.org/vinum/replacing-drive.html is pretty
cryptic for me. So before I'll do anything, I'd like to confirm the
steps so I don't trash the data.

First - here is what happened (let me just mention, that I've upgraded
from 4.4-RELEASE to 4.6-RELEASE-p1 over a week ago. I'm worried that
the drive problem could be caused by the new ATA code):

---- extract from /var/log/messages ----

Jul 22 14:18:41 grasshopper /kernel: ad8: no status, reselecting device
Jul 22 14:18:41 grasshopper /kernel: ad8: timeout sending command=3Dca s=3D=
ff e=3D00
Jul 22 14:18:41 grasshopper /kernel: ad8: error executing command - resetti=
ng
Jul 22 14:18:41 grasshopper /kernel: ata4: resetting devices ..=20
Jul 22 14:18:41 grasshopper /kernel: ad8: removed from configuration
Jul 22 14:18:41 grasshopper /kernel: vinum: raid.p0.s2 is stale by force
Jul 22 14:18:41 grasshopper /kernel: vinum: raid.p0 is degraded
Jul 22 14:18:41 grasshopper /kernel: fatal :raid.p0.s2 write error, block 1=
10221705 for 65536 bytes
Jul 22 14:18:41 grasshopper /kernel: raid.p0.s2: user buffer block 33066457=
6 for 65536 bytes
Jul 22 14:18:41 grasshopper /kernel: drive3: fatal drive I/O error, block 1=
10221705 for 65536 bytes
Jul 22 14:18:41 grasshopper /kernel: vinum: drive drive3 is down
Jul 22 14:18:41 grasshopper /kernel: done
Jul 22 14:18:43 grasshopper pop3d[37420]: login: pc70[192.168.0.70] lipton =
plaintext=20
Jul 22 14:18:47 grasshopper /kernel: swap_pager: I/O error - pagein failed;=
 blkno 1024,size 4096, error 6
Jul 22 14:18:47 grasshopper /kernel: vm_fault: pager read error, pid 88871 =
(smbd)

---- end of extract from /var/log/messages ----

So, there was some problem with ad8, vinum has updated state of the
plex, and then I saw a kernel panic - because there has also been=20
one of the swap partitions on ad8 - and system rebooted.

Now `vinum l` shows:

----------------------------------------

grasshopper beaker# vinum l
4 drives:
D drive1                State: up       Device /dev/ad4s1e      Avail: 0/76=
060 MB (0%)
D drive2                State: up       Device /dev/ad6s1e      Avail: 0/76=
060 MB (0%)
D drive3                State: up       Device /dev/ad8s1e      Avail: 0/76=
060 MB (0%)
D drive4                State: up       Device /dev/ad10s1e     Avail: 0/76=
060 MB (0%)

1 volumes:
V raid                  State: up       Plexes:       1 Size:        222 GB

1 plexes:
P raid.p0            R5 State: degraded Subdisks:     4 Size:        222 GB

4 subdisks:
S raid.p0.s0            State: up       PO:        0  B Size:         74 GB
S raid.p0.s1            State: up       PO:      128 kB Size:         74 GB
S raid.p0.s2            State: stale    PO:      256 kB Size:         74 GB
S raid.p0.s3            State: up       PO:      384 kB Size:         74 GB

----------------------------------------

What I'd like to do is shutdown the system, remove drive3, and
bring up the system with only 3 valid drives. Then check if
something is wrong with removed drive (maybe it is cabling or new
ATA driver problem?). Finally - physically attach "new" drive, and
bring the Vinum volume fully functional again.

So what I want to do is:

1. Do everything under `script` so if I mess up things - there will be
   example of what NOT to do ;)

2. issue: `vinum stop raid.p0.s2`

3. Shutdown system, physically remove the disk. Boot system without
   the disk, in the meanwhile stress-test the disk.

4. Create labels on new disk, identical to what was on failed
   drive. I'll do that on other system to be safe.

5. Shutdown the system. Physically mount the disk. Boot the system
   again.

6. Now I'm confused. I'm not sure wheather I will have to create the
   drive (as in http://www.vinumvm.org/vinum/replacing-drive.html) or
   not (I'm not sure after reading the "vinum problems continued"
   thread - about 23 Jun 2002)

7. Assuming that I'll figure out how to pass step 6 - I'll have
   raid.p0.s2 in state "obsolete", I think. Then I'll just issue
   `vinum start raid.p0.s0`. Now - if I understand the vinum web page
   - I'll have to use the infamous `setstate` (see below).

   `vinum setstate obsolete raid.p0.s2`

   Then start it again (?) which would work this time as expected:

   `vinum start raid.p0.s2`


PS. I have seen Greg saying on this list (about setstate command)

>> Setstate up ....
>
> You know that's dangerous, don't you?  To quote the man page:
>
>     This bypasses the usual consistency mechanism of vinum and should
>     be used only for recovery purposes.  It is possible to crash the
>     system by incorrect use of this command.
>
> Maybe I should add "don't use this command unless you know exactly
> what you're doing.".

Well, I think that adding such sentence would be in conflict with
http://www.vinumvm.org/vinum/replacing-drive.html

I hardly know what is the idea of the following sequence

  vinum -> start test.p1.s0
  Can't start test.p1.s0: Device busy (16)
  vinum -> setstate obsolete test.p1.s0

and these are instructions from the replacing-drive.html document.

Best Regards,
     Krzysztof J=EAdruczyk


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?uk7nng0lc.fsf>