Date: Fri, 24 Apr 1998 14:11:00 -0500 From: Patrick Hartling <mystify@friley63.res.iastate.edu> To: "Justin T. Gibbs" <gibbs@plutotech.com> Cc: scsi@FreeBSD.ORG Subject: Re: CAM == CAM Ate my Machine (and severly corrupted file systems too) Message-ID: <199804241911.OAA04442@friley63.res.iastate.edu> In-Reply-To: Message from "Justin T. Gibbs" <gibbs@plutotech.com> of "Fri, 24 Apr 1998 11:55:39 MDT." <199804241759.LAA02289@pluto.plutotech.com>
next in thread | previous in thread | raw e-mail | index | archive | help
"Justin T. Gibbs" <gibbs@plutotech.com> wrote: } >The intention of this message is to warn people of the possibilty of serious } >disk corruption when using CAM + SMP + ccd. } } It is likely CAM + BT-958... Yes, I'm now quite certain you are correct. I should have been a little more clear in my previous message about which partitions are on which disks and which controllers. Every partition that ended up being corrupted (i.e., /var, /usr and /home) are on the Viking disk that is part of the BT-958 bus. } >This morning when I got back from class, I discovered that my machine had } >apparently gotten hungry and had eaten itself. It had been very stable for } >10 days running an SMP kernel with the CAM patches (built April 13, 1998), } >but then this happened. Unfortunately, I don't know what caused this, but } >it certainly caused me a lot of stress this morning. } } Was it wedged or did it panic or was it running normally and when you } attempted some operation failed? When I got back, it was waiting for me to provide a path to root's shell. fsck could not find the super block for the /var partition. I assume what happened was that the machine panic'd and rebooted while I was gone. I don't know how long it had been in that state, but my roommate informed me that he had heard the disks grinding just a few minutes before I got back. } Which disk and controller contains /var. Is it part of your CCD array? It is not part of the CCD array. } >However, the real horror story was the complete loss of my home directory. } >BUT I have /home on the mirrored ccd, and the second partition in the ccd wa *** s } >fully intact by some miracle. :) } } It was probably on the Adaptec controller - the most well tested of the } controller drivers for CAM. Thankfully! :) If it weren't for that, I'd be one unhappy camper right now. } >Once I found that the second partition } >was fine, I tried to do: } > } > dd if=/dev/rda2s1e of=/dev/rda1s1e bs=64k } > } >but it kept saying that rda1s1e was a read-only filesystem. } } My guess is that this error is coming from dsopen(), but I don't know why. } I can't see how this could be a CAM problem. I don't either. I just noted it because it seemed weird. } >Since getting everything more or less back to normal, I have crashed my } >machine again today by accidentally doing: } > } > disklabel -r sd4c } } This should not be able to crash your system. Disklabel should simply open } up the device by that name in /dev and, should it exist, it will take } it directly to the da driver. My guess is that there was still some latent } corruption in '/' that caused a panic. That's possible. My root partition is on the WD disk that's also part of the BT-958 bus. I hadn't considered that possibility, but I was certainly taken aback when it did cause a crash. } When you are recovering your } system or leaving it unattended, please leave the console switched to } VTY0 so that console messages can be captured should an error occur. } Unless you have a serial console, you will never be able to get to the } useful information for fixing problems like this if you are in X. I will do that in the future. I did the above command remotely just to verify that I had screwed up a disklabel a while ago (which I had) and not as root. } A few words about your BT-958. Ensure that you are running good firmware on } your card. Leonard Zubkoff has a great page that talks about BT firmware } issues with links to known good firmware: } } http://www.dandelion.com/Linux/BusLogic.html I will definitely look into this ASAP. Thank you for the info. } You are also the first person to report using the BT-958 with this driver. } There are bound to be "some" problems with it as the driver was written } from the ground up and was only tested by my on an older BT-948. Can you } send me the dmesg output from your system? Was there any noticeable change } performance wise in the system after switching to CAM? I haven't had a chance to run any kind of benchmarks with CAM vs the old SCSI system. I'll have to do that as soon as I get time because I think it would very interesting to see what kind of performance I'm getting now. I'm happy to report that things do "feel" faster though--especially with Jaz disks. Here's the dmesg output: d: 12, version: 0x00040011, at 0xfec08000 io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec00000 Probing for devices on PCI bus 0: chip0: <Intel 82440FX (Natoma) PCI and memory controller> rev 0x02 on pci0.0.0 fxp0: <Intel EtherExpress Pro 10/100B Ethernet> rev 0x01 int a irq 18 on pci0.6.0 fxp0: Ethernet address 00:a0:c9:14:0d:5f chip1: <Intel 82371SB PCI to ISA bridge> rev 0x01 on pci0.7.0 chip2: <Intel 82371SB USB host controller> rev 0x01 int d irq 11 on pci0.7.2 ahc0: <Adaptec aic7880 Ultra SCSI adapter> rev 0x00 int a irq 17 on pci0.9.0 ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs bt0: <Buslogic Multimaster SCSI host adapter> rev 0x08 int a irq 16 on pci0.11.0 bt0: BT-958 FW Rev. 5.05R Ultra Wide SCSI Host Adapter, SCSI ID 7, 192 CCBs vga0: <Matrox MGA 2064W graphics accelerator> rev 0x01 int a irq 17 on pci0.15.0 Probing for devices on the ISA bus: sc0 at 0x60-0x6f irq 1 on motherboard sc0: VGA color <16 virtual consoles, flags=0x0> pcm0 at 0x530 irq 5 drq 1 flags 0xa610 on isa mss_attach <mss>0 at 0x530 irq 5 dma 1:0 flags 0xa610 sio0 at 0x3f8-0x3ff irq 4 on isa sio0: type 16550A sio1 at 0x2f8-0x2ff irq 3 on isa sio1: type 16550A lpt0 at 0x378-0x37f irq 7 on isa lpt0: Interrupt-driven port lp0: TCP/IP capable interface psm0 at 0x60-0x64 irq 12 on motherboard psm0: model MouseMan+, device ID 0 fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in npx0 on motherboard npx0: INT 16 interface APIC_IO: routing 8254 via 8259 on pin 0 ccd0-3: Concatenated disk drivers SMP: AP CPU #1 Launched! bt0: bt_cmd: Timeout waiting for adapter ready, status = 0x0 bt0: btfetchtransinfo - Inquire Setup Info Failed (probe19:bt0:0:4:0): MODE SENSE(06). CDB: 1a 0 a 0 14 0 (probe19:bt0:0:4:0): ILLEGAL REQUEST asc:24,0 (probe19:bt0:0:4:0): Invalid field in CDB da2 at ahc0 bus 0 target 0 lun 0 da2: <QUANTUM VIKING 4.5 WSE 880R> Fixed Direct Access SCSI2 device da2: Serial Number 174721630980 da2: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled da2: 4345MB (8899737 512 byte sectors: 255H 63S/T 553C) da1 at bt0 bus 0 target 1 lun 0 da1: <QUANTUM VIKING 4.5 WSE 880R> Fixed Direct Access SCSI2 device da1: Serial Number 174721632608 da1: 20.0MB/s transfers (20.0MHz, offset 15), Tagged Queueing Enabled da1: 4345MB (8899737 512 byte sectors: 255H 63S/T 553C) (da4:bt0:0:4:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 (da4:bt0:0:4:0): NOT READY asc:3a,0 (da4:bt0:0:4:0): Medium not present da4 at bt0 bus 0 target 4 lun 0 da4: <iomega jaz 1GB J^77> Removable Direct Access SCSI2 device da4: 10.0MB/s transfers (10.0MHz, offset 15) da4: Attempt to query device size, failed cd0 at bt0 bus 0 target 3 lun 0 cd0: <TEAC CD-ROM CD-516S 1.0G> Removable CD-ROM SCSI2 device cd0: Serial Number \^_ cd0: 10.0MB/s transfers (10.0MHz, offset 8) cd0: Attempt to query device size failed: NOT READY, Medium not present da0 at bt0 bus 0 target 0 lun 0 da0: <WDIGTL ENTERPRISE 1.61> Fixed Direct Access SCSI2 device da0: Serial Number WS7000054039 da0: 3.300MB/s transfers , Tagged Queueing Enabled da0: 2077MB (4254819 512 byte sectors: 255H 63S/T 264C) ccd0: mirror/parity forces uniform flag -Patrick Patrick L. Hartling | Research Assistant, ICEMT mystify@friley63.res.iastate.edu | SE Lab - 1117 Black Engineering http://www.public.iastate.edu/~oz | http://www.icemt.iastate.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804241911.OAA04442>