Date: Mon, 22 Jul 2002 18:59:43 +0200 From: beaker@iavision.com (Krzysztof =?iso-8859-2?q?J=EAdruczyk?=) To: freebsd-stable@freebsd.org Subject: Vinum - replacing a subdisk in raid 5 config Message-ID: <uk7nng0lc.fsf@iavision.com>
next in thread | raw e-mail | index | archive | help
Hi, In the company I work for we use 4*80 GB IDE configured as Vinum raid5 volume. It has worked flawlessly for almost a year, but today one of the disks has started making trouble. Having read some recent posts about problems with recovering volumes - I'm pretty scared right now. The http://www.vinumvm.org/vinum/replacing-drive.html is pretty cryptic for me. So before I'll do anything, I'd like to confirm the steps so I don't trash the data. First - here is what happened (let me just mention, that I've upgraded from 4.4-RELEASE to 4.6-RELEASE-p1 over a week ago. I'm worried that the drive problem could be caused by the new ATA code): ---- extract from /var/log/messages ---- Jul 22 14:18:41 grasshopper /kernel: ad8: no status, reselecting device Jul 22 14:18:41 grasshopper /kernel: ad8: timeout sending command=3Dca s=3D= ff e=3D00 Jul 22 14:18:41 grasshopper /kernel: ad8: error executing command - resetti= ng Jul 22 14:18:41 grasshopper /kernel: ata4: resetting devices ..=20 Jul 22 14:18:41 grasshopper /kernel: ad8: removed from configuration Jul 22 14:18:41 grasshopper /kernel: vinum: raid.p0.s2 is stale by force Jul 22 14:18:41 grasshopper /kernel: vinum: raid.p0 is degraded Jul 22 14:18:41 grasshopper /kernel: fatal :raid.p0.s2 write error, block 1= 10221705 for 65536 bytes Jul 22 14:18:41 grasshopper /kernel: raid.p0.s2: user buffer block 33066457= 6 for 65536 bytes Jul 22 14:18:41 grasshopper /kernel: drive3: fatal drive I/O error, block 1= 10221705 for 65536 bytes Jul 22 14:18:41 grasshopper /kernel: vinum: drive drive3 is down Jul 22 14:18:41 grasshopper /kernel: done Jul 22 14:18:43 grasshopper pop3d[37420]: login: pc70[192.168.0.70] lipton = plaintext=20 Jul 22 14:18:47 grasshopper /kernel: swap_pager: I/O error - pagein failed;= blkno 1024,size 4096, error 6 Jul 22 14:18:47 grasshopper /kernel: vm_fault: pager read error, pid 88871 = (smbd) ---- end of extract from /var/log/messages ---- So, there was some problem with ad8, vinum has updated state of the plex, and then I saw a kernel panic - because there has also been=20 one of the swap partitions on ad8 - and system rebooted. Now `vinum l` shows: ---------------------------------------- grasshopper beaker# vinum l 4 drives: D drive1 State: up Device /dev/ad4s1e Avail: 0/76= 060 MB (0%) D drive2 State: up Device /dev/ad6s1e Avail: 0/76= 060 MB (0%) D drive3 State: up Device /dev/ad8s1e Avail: 0/76= 060 MB (0%) D drive4 State: up Device /dev/ad10s1e Avail: 0/76= 060 MB (0%) 1 volumes: V raid State: up Plexes: 1 Size: 222 GB 1 plexes: P raid.p0 R5 State: degraded Subdisks: 4 Size: 222 GB 4 subdisks: S raid.p0.s0 State: up PO: 0 B Size: 74 GB S raid.p0.s1 State: up PO: 128 kB Size: 74 GB S raid.p0.s2 State: stale PO: 256 kB Size: 74 GB S raid.p0.s3 State: up PO: 384 kB Size: 74 GB ---------------------------------------- What I'd like to do is shutdown the system, remove drive3, and bring up the system with only 3 valid drives. Then check if something is wrong with removed drive (maybe it is cabling or new ATA driver problem?). Finally - physically attach "new" drive, and bring the Vinum volume fully functional again. So what I want to do is: 1. Do everything under `script` so if I mess up things - there will be example of what NOT to do ;) 2. issue: `vinum stop raid.p0.s2` 3. Shutdown system, physically remove the disk. Boot system without the disk, in the meanwhile stress-test the disk. 4. Create labels on new disk, identical to what was on failed drive. I'll do that on other system to be safe. 5. Shutdown the system. Physically mount the disk. Boot the system again. 6. Now I'm confused. I'm not sure wheather I will have to create the drive (as in http://www.vinumvm.org/vinum/replacing-drive.html) or not (I'm not sure after reading the "vinum problems continued" thread - about 23 Jun 2002) 7. Assuming that I'll figure out how to pass step 6 - I'll have raid.p0.s2 in state "obsolete", I think. Then I'll just issue `vinum start raid.p0.s0`. Now - if I understand the vinum web page - I'll have to use the infamous `setstate` (see below). `vinum setstate obsolete raid.p0.s2` Then start it again (?) which would work this time as expected: `vinum start raid.p0.s2` PS. I have seen Greg saying on this list (about setstate command) >> Setstate up .... > > You know that's dangerous, don't you? To quote the man page: > > This bypasses the usual consistency mechanism of vinum and should > be used only for recovery purposes. It is possible to crash the > system by incorrect use of this command. > > Maybe I should add "don't use this command unless you know exactly > what you're doing.". Well, I think that adding such sentence would be in conflict with http://www.vinumvm.org/vinum/replacing-drive.html I hardly know what is the idea of the following sequence vinum -> start test.p1.s0 Can't start test.p1.s0: Device busy (16) vinum -> setstate obsolete test.p1.s0 and these are instructions from the replacing-drive.html document. Best Regards, Krzysztof J=EAdruczyk To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?uk7nng0lc.fsf>