From owner-freebsd-stable Fri Jun 11 7: 4:24 1999 Delivered-To: freebsd-stable@freebsd.org Received: from ideaglobal.com (ultra2.ideaglobal.com [194.36.20.11]) by hub.freebsd.org (Postfix) with ESMTP id 7EF79154A8 for ; Fri, 11 Jun 1999 07:04:17 -0700 (PDT) (envelope-from kiril@ideaglobal.com) Received: (from kiril@localhost) by ideaglobal.com (8.9.2/8.9.2) id OAA17035; Fri, 11 Jun 1999 14:55:58 +0100 (BST) From: Kiril Mitev Message-Id: <199906111355.OAA17035@ideaglobal.com> Subject: Re: vinum disk has gone AWOL, help! To: grog@folly.lemis.com (Greg Lehey) Date: Fri, 11 Jun 1999 14:55:58 +0100 (BST) Cc: kiril@ideaglobal.com, Cy.Schubert@uumail.gov.bc.ca, freebsd-stable@FreeBSD.ORG In-Reply-To: <19990610162145.22313@folly.lemis.com> from "Greg Lehey" at Jun 10, 99 04:21:45 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG OK :-) I'll try to give the "big picture", see below for details... > > On Friday, 4 June 1999 at 21:58:07 +0100, Kiril Mitev wrote: > >> In message <199906041920.UAA17798@ideaglobal.com>, Kiril Mitevv writes: > >>> (Sorry if this is more appropriate for -questions...) > >>> > >>> > >>> after my last reboot, which was NOT a panic or anything like that > >>> my vinum volume sort of disappeared... > >>> > >>> vinum itself is still happy, and a listing shows all my > >>> bits & pieces, up to the volume level as OK. > >>> > >> If that fails, try fsck -b . To get > >> the other block numbers for alternate superblocks, use newfs -N, > >> which will go through all of the motions of creating a filesystem, > >> e.g. print the superblock numbers, without actually creating a > >> filesystem. If this fails, you're pretty much hosed. > > > > That looks like it might actually fix it, unless there > > more corruption somewhere > > This sounds funny. If you have any evidence that it was caused by > Vinum, I'd be very interested to see it. But I suspect that the cause > is elsewhere, and Vinum has little to do with it. > > Greg > -- The evidence (as they say) is "purely circumstantial"... Let me explain the h/w setup first. The box in question has a dual P2 MB with the built-in Adaptec 7890 2xUW SCSI (7895? something like that) The "primary" scsi channel has a UW side and an "old-fashioned" side the "secondary" channel has only a UW bus, for a total of 3 connectors on the MB. The boot-critical stuff (/,/usr,/var) is located on an IDE disk, due to previous painful experiences with SCSI weirdness. Obviously this is the boot disk, and the Adaptec is configured NOT to setup BIOS disk devices. scsi 1 UW has 3 nice 9gb disks hanging off it, with scsi id's of 0,1,2. these disks are located inside the PC itself, with disk id 2 being the closest to the controller, and disk id 0 furthest away, last on that bus and terminated. these disks map to da0,da1 and da2 in F-BSD scsi 2 UW has 2 more 4gb disks on it, with scsi id's of 4 and 5, disk5 is last on the bus and terminated. these disks map to da3 and da4 /,/usr,/var are on a single IDE disk. da0 is mounted on /home as a normal disk da1,2,3,4 are bunched together into a 25gb vinum volume with me so far ? OK... the original problem appeared when I tried to plug in a scsi tape to backup my stuff, since I was intending to upgrade from 3.1 to 3.2 the tape drive, scsi id 6 was connected to the "slow" connector on the primary scsi, and there was an active terminator after the tape on that bus. the very first reboot the scsi disks went into a "device in timeout, device not in timeout loop" which got nowhere, so i had to hit the good ole reset button and start playing with the box. (remember that the tape drive shares a bus with 1 non-vinum and 2 vinum disks? good) my first thought was that the termination got screwed, so i spent quite a few hours testing the various termination options on the devices and on the controller - same result every time (well, almost every time. in some configurations,, the controller lost the disks partially or completely) just for the fun of it, put the tape drive on different scsi id - no change further investigation showed that the eternal timeout loop occurs on the 2 vinum disks that shared the bus with the tape. the test pattern was (more or less) like this... boot -s, fsck /home - no problem, reboot boot -s, start vinum, fsck vinum volume - timeout & hang, reboot boot -s, start vinum, dd if=/dev/da1 of=/dev/null count=10 ( a vinum disk ) - timeout/hang, reboot boot -s, DONT start vinum, dd from the non-vinum disk, no problem, reboot boot -s, DONT start vinum, dd from vinum disk - no problem, reboot boot -s, start vinum, dd from non-vinum disk - no problem, reboot (start vinum means running the following commands: cd /etc/ . ./rc.conf vinum read $vinum_drives ) test the above with a non-SMP kernel - identical results... I finally gave up and did the backup by partially copying from the vinum disk to the non-vinum one with the tape unpluggeed, rebooting into single-user mode with the tape plugged in and backing up. {repeat 6 times :-)))) } it wasn't as bad as it sounds, since a single tape would not have held the full disk anyway... updated the OS without a hitch (kudos to the developers, btw), so i did not need those backups after all :-)) ^^^^^ that was 3.1 ^^^^^^^ vvvvv this is 3.2 vvvvvv a couple of weeks later I decided to test a cd-burner on the box, so i plugged one in - in exactly the same spot both in terms of physical connections and scis id's as i had the tape drive. i was quite appehensive that the same sort of thing might happen so i made sure i boot into single-user mode first, which went ok, played with the burner a bit, then decided to reboot into multi user to see what happens... as soon as the vinum volume came up for fsck, I got the scsi errors and that very nasty fsck error, panicked and took out the burner. the next reboot did not have the scsi timeouts. neither did it have a valid partition, which caused a lot of stress until someone pointed out to me (very politely, though i did not deserve it in retrospect :-)))) that I can use an alternative super block for fsck .... happy end of story. but to answer your question - no, I cannot guarantee that it was vinum's fault, but I _think_ I have eliminated all other variables i am also (understandably, i hope) rather hesitant to do any more playing around with the hardware.... but i can run a few tests if you can tell me what to do Kiril To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message