From owner-freebsd-questions Fri Oct 26 20:51:19 2001 Delivered-To: freebsd-questions@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 8AAA137B40A for ; Fri, 26 Oct 2001 20:49:27 -0700 (PDT) Received: by wantadilla.lemis.com (Postfix, from userid 1004) id 8361D6ACB7; Fri, 26 Oct 2001 09:14:55 +0930 (CST) Date: Fri, 26 Oct 2001 09:14:55 +0930 From: Greg Lehey To: Ben Eisenbraun Cc: freebsd-questions@FreeBSD.org Subject: Re: recovery of corrupt vinum plexes? Message-ID: <20011026091455.A4706@wantadilla.lemis.com> References: <20011023044950.A43848@nitrogen.nexthop.net> <20011023183023.M27668@wantadilla.lemis.com> <20011023055005.A44324@nitrogen.nexthop.net> <20011025103000.A25441@wantadilla.lemis.com> <20011025023359.D64298@nitrogen.nexthop.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20011025023359.D64298@nitrogen.nexthop.net>; from bene@nitrogen.nexthop.net on Thu, Oct 25, 2001 at 02:33:59AM -0400 Organization: The FreeBSD Project Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.FreeBSD.org/ X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thursday, 25 October 2001 at 2:33:59 -0400, Ben Eisenbraun wrote: > On Thu, Oct 25, 2001 at 10:30:00AM +0930, Greg Lehey wrote: >> On Tuesday, 23 October 2001 at 5:50:05 -0400, Ben Eisenbraun wrote: > trace> > >> Hmm. That could have been just about anything, probably a corrupt >> request structure. Without a dump it's difficult to say very much, >> but in view of the fact that the drives have gone away, it's possible >> that it was trying to talk to them anyway. I'd like to see a dump of >> this. > > I think I can reproduce this if I frob with it enough, but this > kernel wasn't build with debug symbols, and the system sources have > been recently updated, so I'm not sure if this would be useful. Do > you have any suggestions? I'd need the debug output. I won't find it otherwise. >> OK, let's hope that only the Vinum labels are corrupted. You have a >> fair chance that the data section hasn't been overwritten, since >> there's a copy of the config information (128 kB) between the label >> and the data. In that case, you should be able to recreate the >> objects with this config file: >> >> device max3 device /dev/ad4s1e >> device max4 device /dev/ad6s1e > > "device" in the first column produced an error. Looking at the config, > I figured you meant "drive max# device /dev/etc", so I swapped "drive" > for the first "device" in each line. Here's the output of the 'create', > 'start', 'list' and 'list -v' commands. > > vinum -> list > 8 drives: > D max3 State: up Device /dev/ad4e Avail: 0/57239 MB (0%) > D max4 State: up Device /dev/ad6e Avail: 0/57239 MB (0%) > D max1 State: up Device /dev/ad0s1e Avail: 0/19529 MB (0%) > D max2 State: up Device /dev/ad2s1e Avail: 0/19529 MB (0%) > D wd1 State: up Device /dev/ad8s1e Avail: 0/57239 MB (0%) > D wd2 State: up Device /dev/ad10s1e Avail: 0/57239 MB (0%) > > 2 volumes: > V stripe-mirror State: up Plexes: 2 Size: 111 GB > V var-mirror State: up Plexes: 2 Size: 19 GB > > 4 plexes: > P stripe-mirror.p0 S State: corrupt Subdisks: 2 Size: 111 GB > P stripe-mirror.p1 S State: corrupt Subdisks: 2 Size: 111 GB > P var-mirror.p0 C State: up Subdisks: 1 Size: 19 GB > P var-mirror.p1 C State: up Subdisks: 1 Size: 19 GB > > 6 subdisks: > S stripe-mirror.p0.s0 State: crashed PO: 0 B Size: 55 GB > S stripe-mirror.p0.s1 State: up PO: 512 kB Size: 55 GB > S stripe-mirror.p1.s0 State: crashed PO: 0 B Size: 55 GB > S stripe-mirror.p1.s1 State: up PO: 512 kB Size: 55 GB > S var-mirror.p0.s0 State: up PO: 0 B Size: 19 GB > S var-mirror.p1.s0 State: up PO: 0 B Size: 19 GB Good! > For some reason, the drives came back at ad[46]e not ad[46]s1e. This is not so good. It suggests that the partition table has changed. Are you sure you haven't run fdisk or disklabel on these drives? > I was thinking about recent changes to the system config, since > this setup had run reliably for several months off the same > sources, and it was rebooted about 1 week before this crash > happened. Some new sysctl's took effect after that reboot: > > kern.ipc.somaxconn=2048 > net.inet.icmp.drop_redirect=1 > net.inet.icmp.log_redirect=1 > net.inet.tcp.sendspace=32768 > net.inet.tcp.recvspace=32768 > vfs.vmiodirenable=1 No, it has nothing to do with them. > Also, 2-3 days before the problems started occurring, I had > created some additional swap space on both ad4 and ad6. Those > are the only lowlevel changes made to the system since it was built > from the running sources. This could be an issue. > FreeBSD 4.4-RC FreeBSD 4.4-RC #0: Tue Aug 21 20:53:12 EDT 2001 > root@whiskey.klatsch.org:/usr/obj/usr/src/sys/WHISKEY i386 > > root@ [10:21pm][~]>>disklabel /dev/ad4s1 > > 8 partitions: > # size offset fstype [fsize bsize bps/cpg] > b: 2097152 117226242 swap # (Cyl. 7296*- 7427*) > c: 120053682 0 unused 0 0 # (Cyl. 0 - 7472*) > e: 117226242 0 vinum # (Cyl. 0 - 7296*) > > root@ [10:23pm][~]>>disklabel /dev/ad6s1 > > 8 partitions: > # size offset fstype [fsize bsize bps/cpg] > b: 2097152 117226242 swap # (Cyl. 7296*- 7427*) > c: 120053682 0 unused 0 0 # (Cyl. 0 - 7472*) > e: 117226242 0 vinum # (Cyl. 0 - 7296*) That looks OK, however. You didn't run fdisk? > I'm happy to try anything that would assist you in tracking down the > problem, and I can arrange for console access if it would be > helpful. OK, *assuming* that the data is still there in the same place, you should be able to do: vinum -> setstate up stripe-mirror.p0 stripe-mirror.p0.s1 stripe-mirror.p1 stripe-mirror.p1.s1 This will set them into the "up" state. Next do an fsck -n on the plexes (/dev/vinum/plex/stripe-mirror.p0 and /dev/vinum/plex/stripe-mirror.p1). Whatever you do, make sure you don't write anything back to the disk (that's what -n does). See how bad fsck thinks the damage is. Don't send me the whole output, just the beginning. Greg -- When replying to this message, please copy the original recipients. If you don't, I may ignore the reply. For more information, see http://www.lemis.com/questions.html See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message