From owner-freebsd-bugs Tue Jul 18 22:10:11 1995 Return-Path: bugs-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.11/8.6.6) id WAA04740 for bugs-outgoing; Tue, 18 Jul 1995 22:10:11 -0700 Received: from tellab5.lisle.tellabs.com (tellab5.lisle.tellabs.com [138.111.243.28]) by freefall.cdrom.com (8.6.11/8.6.6) with SMTP id WAA04733 for ; Tue, 18 Jul 1995 22:10:09 -0700 From: mikebo@tellabs.com Received: from tellabk.tellabs.com by tellab5.lisle.tellabs.com with smtp (Smail3.1.29.1 #4) id m0sYRO2-000jC1C; Wed, 19 Jul 95 00:09 CDT Received: by tellabk.tellabs.com (4.1/1.9) id AA13275; Wed, 19 Jul 95 00:09:32 CDT Message-Id: <9507190509.AA13275@tellabk.tellabs.com> Subject: Re: FBSD v2.0.5R: AHA2742AT + Exabyte scrogged my root disk! To: gibbs@freefall.cdrom.com Date: Wed, 19 Jul 1995 00:09:32 -0500 (CDT) Cc: mikebo (Mike Borowiec), bugs@freebsd.org In-Reply-To: <199507181924.MAA08369@freefall.cdrom.com> from "& freefall.cdrom.com" at Jul 18, 95 12:24:43 pm X-Mailer: ELM [version 2.4 PL24] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 7381 Sender: bugs-owner@freebsd.org Precedence: bulk Justin - You wrote: > Michael Borowiec wrote: > >After putting the finishing touches on my 2.0.5 configuration, I tried > >to load the 2.0.5 distribution tree from an Exabyte 8200 tape to disk. > >During the tar extract, I got the following messages repeatedly: > > > > st1: oops not queued > > biodone: buffer already done > > tar: read error on /dev/rst1: Input/output error > > ahc0: target 0, lun 0 (sd0) timed out > > > >Whenever I access the Exabyte, the root drive seems to be completely > >inaccessible. The tar extract finally failed, but I subsequently found > >that my root disk was completely trashed, chock full of corruption! To > >recover I had to run fsck several times and allow it to remove several > >dozen critical programs and configuration files. ARGH! > > This was fixed recently in -current. If you have kernel source, the > patch is simple. I've appended it to this message. ... > It was actually a SCSI system bug that hit the 2742 particuallarly hard > because of the way it allocates per command resources. ... > Try the patch and get back to me. As the author of the driver and the > proud new owner of a tape drive, you should see rapid progress on these > problems. > I downloaded the latest st.c from "current" and built a new kernel. I successfully loaded the ~450MB 2.0.5-RELEASE tree from my Exabyte to disk. However, I noticed the following anomolous behavior: o two "ahc0: target 0, lun 0 (sd0) timed out" messages at the beginning of the tar, and a few more at the end while the tape was rewinding. I was holding my breath, as last time this happened I got five or six such messages before the OS panic'ed and corrupted my root. I WAS able to access the root drive while the tar ran. o I get "ahc0: target 0, lun 0 (sd0) timed out" messages around once every ~10 seconds whenever I do any non-read/write tape operation, such as: rewind, rewoffl, etc.. During these operations, the SCSI bus activity light (powered by the 2742AT) is lit constantly, and accesses to disk drives are defered until the operation completes and the light goes out. o In the past, when doing SCSI transfers, the bus activity light flickered in concert with the drive activity lights. Now it appears to stay lit solid during an entire tape job. Disk to disk copies do exhibit the flicker... The worst problem appears to be solved, but there would seem to be a few more dust-bunnies under the couch. I would be pleased to assist by testing your mods on my system. See my dmesg output below... > ... Between the aic7xxx driver and the SCSI system as a > whole, all of my free time is already allocated. > We appreciate it! Regards, - Mike -- -------------------------------------------------------------------------- Michael Borowiec Network Operations Tellabs Operations, Inc. mikebo@TELLABS.COM 1000 Remington Blvd. MS109 708-378-6007 FAX: 708-378-6714 Bolingbrook, IL, USA 60440 -------------------------------------------------------------------------- -- dmesg output: Jul 18 20:47:12 timesink /kernel: FreeBSD 2.0.5-RELEASE #2: Tue Jul 18 20:42:33 CDT 1995 Jul 18 20:47:12 timesink /kernel: kroot@timesink:/usr/src/sys/compile/TIMESINK Jul 18 20:47:13 timesink /kernel: CPU: i486DX (486-class CPU) Jul 18 20:47:13 timesink /kernel: real memory = 16384000 (4000 pages) Jul 18 20:47:13 timesink /kernel: avail memory = 14749696 (3601 pages) Jul 18 20:47:13 timesink /kernel: Probing for devices on the ISA bus: Jul 18 20:47:13 timesink /kernel: sc0 at 0x60-0x6f irq 1 on motherboard Jul 18 20:47:13 timesink /kernel: sc0: VGA color <16 virtual consoles, flags=0x0> Jul 18 20:47:13 timesink /kernel: ed0 at 0x280-0x29f irq 9 maddr 0xd4000 msize 16384 on isa Jul 18 20:47:13 timesink /kernel: ed0: address 00:00:c0:d1:09:2d, type WD8013EP (16 bit) Jul 18 20:47:13 timesink /kernel: bpf: ed0 attached Jul 18 20:47:13 timesink /kernel: sio0 at 0x3f8-0x3ff irq 4 on isa Jul 18 20:47:13 timesink /kernel: sio0: type 16550A Jul 18 20:47:13 timesink /kernel: lpt0 at 0x378-0x37f irq 7 on isa Jul 18 20:47:13 timesink /kernel: lpt0: Interrupt-driven port Jul 18 20:47:13 timesink /kernel: lp0: TCP/IP capable interface Jul 18 20:47:13 timesink /kernel: mse0 at 0x23c irq 3 on isa Jul 18 20:47:13 timesink /kernel: pca0 on motherboard Jul 18 20:47:13 timesink /kernel: pca0: PC speaker audio driver Jul 18 20:47:13 timesink /kernel: fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa Jul 18 20:47:14 timesink /kernel: fdc0: NEC 72065B Jul 18 20:47:14 timesink /kernel: fd0: 1.44MB 3.5in Jul 18 20:47:14 timesink /kernel: fd1: 1.2MB 5.25in Jul 18 20:47:14 timesink /kernel: ahc0: reading board settings Jul 18 20:47:14 timesink /kernel: ahc0: 274x Twin Channel, A SCSI Id=7, B SCSI Id=7, aic7770 >= Rev E, 4 SCBs Jul 18 20:47:14 timesink /kernel: ahc0: Using Level Sensitive Interrupts Jul 18 20:47:14 timesink /kernel: ahc0: Downloading Sequencer Program...Done Jul 18 20:47:14 timesink /kernel: ahc0 at 0x5000-0x50ff irq 11 on eisa slot 5 Jul 18 20:47:14 timesink /kernel: ahc0: Probing channel A Jul 18 20:47:14 timesink /kernel: ahc0 waiting for scsi devices to settle Jul 18 20:47:15 timesink /kernel: ahc0: target 0 synchronous at 4.4MB/s, offset = 0xf Jul 18 20:47:15 timesink /kernel: (ahc0:0:0): "IMPRIMIS 94601-15 1250" type 0 fixed SCSI 1 Jul 18 20:47:15 timesink /kernel: sd0(ahc0:0:0): Direct-Access 989MB (2026965 512 byte sectors) Jul 18 20:47:15 timesink /kernel: (ahc0:1:0): "MAXTOR XT-4380S B5A" type 0 fixed SCSI 1 Jul 18 20:47:15 timesink /kernel: sd1(ahc0:1:0): Direct-Access 318MB (651630 512 byte sectors) Jul 18 20:47:15 timesink /kernel: (ahc0:2:0): "ARCHIVE VIPER 150 20000 -000" type 1 removable SCSI 1 Jul 18 20:47:15 timesink /kernel: st0(ahc0:2:0): Sequential-Access st0: Archive Viper 150 is a known rogue Jul 18 20:47:15 timesink /kernel: density code 0x0, drive empty Jul 18 20:47:15 timesink /kernel: (ahc0:5:0): "EXABYTE EXB-8200 4.25" type 1 removable SCSI 1 Jul 18 20:47:15 timesink /kernel: st1(ahc0:5:0): Sequential-Access density code 0x0, Jul 18 20:47:15 timesink /kernel: st1(ahc0:5:0): Target Busy Jul 18 20:47:16 timesink /kernel: Jul 18 20:47:16 timesink /kernel: st1(ahc0:5:0): Target Busy Jul 18 20:47:16 timesink /kernel: Jul 18 20:47:16 timesink /kernel: st1(ahc0:5:0): Target Busy Jul 18 20:47:16 timesink /kernel: drive empty Jul 18 20:47:16 timesink /kernel: (ahc0:6:0): "NEC CD-ROM DRIVE:501 2.2" type 5 removable SCSI 2 Jul 18 20:47:17 timesink /kernel: cd0(ahc0:6:0): CD-ROM cd present.[100146 x 2048 byte records] Jul 18 20:47:17 timesink /kernel: ahc0: Probing Channel B Jul 18 20:47:17 timesink /kernel: ahc0 waiting for scsi devices to settle Jul 18 20:47:17 timesink /kernel: ahb0 not found Jul 18 20:47:17 timesink /kernel: aha0 not found at 0x330 Jul 18 20:47:18 timesink /kernel: npx0 on motherboard Jul 18 20:47:18 timesink /kernel: npx0: INT 16 interface Jul 18 20:47:18 timesink /kernel: sb0 at 0x220 irq 5 drq 1 on isa Jul 18 20:47:18 timesink /kernel: sb0: Jul 18 20:47:18 timesink /kernel: opl0 at 0x388 on isa Jul 18 20:47:18 timesink /kernel: opl0: Jul 18 20:47:18 timesink /kernel: bpf: lo0 attached Jul 18 20:47:18 timesink /kernel: bpf: ppp0 attached Jul 18 20:47:18 timesink /kernel: bpf: sl0 attached -- end of dmesg output