From owner-freebsd-stable Thu Dec 9 7:56:35 1999 Delivered-To: freebsd-stable@freebsd.org Received: from thelab.hub.org (nat196.191.mpoweredpc.net [142.177.196.191]) by hub.freebsd.org (Postfix) with ESMTP id AE8341570E; Thu, 9 Dec 1999 07:56:28 -0800 (PST) (envelope-from scrappy@hub.org) Received: from localhost (scrappy@localhost) by thelab.hub.org (8.9.3/8.9.1) with ESMTP id LAA16335; Thu, 9 Dec 1999 11:56:26 -0400 (AST) (envelope-from scrappy@hub.org) X-Authentication-Warning: thelab.hub.org: scrappy owned process doing -bs Date: Thu, 9 Dec 1999 11:56:25 -0400 (AST) From: The Hermit Hacker To: Ben Speirs Cc: freebsd-scsi@freebsd.org, freebsd-stable@freebsd.org Subject: Re: SCSI problem ... OS or just bus? In-Reply-To: <384F3A52.23868C19@ix.netcom.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG As an update, so far...without turning news back on again, and after upgrading the kernel and doing a make world 12hrs ago, things have been stable *so far*...the first time this happened after we added the drives, it took about 17hrs or so...subsequent ones generally took 2-4hrs... I'm going to re-enable news this afternoon and see if adding that extra thrashing to the system causes a repeat of the problem or not... On Wed, 8 Dec 1999, Ben Speirs wrote: > The Hermit Hacker wrote: > > > > I recently did two upgrades in the course of a few days...upgraded my > > 3.3-STABLE to a more recent version, and added hard drives onto the > > system...now I'm getting SCSI problems that make no sense :( > > > > The machine just hung once more, which its doing every few hours...I can > > get down to the debugger, but a 'trace' doesn't appear to show anyting, so > > I panic... > > > > ========== > > (da4:ahc0:0:8:0): Other SCB Timeout > > (da4:ahc0:0:8:0): SCB 0xeb - timed out in dataout phase, SEQADDR == 0x10f > > (da4:ahc0:0:8:0): Other SCB Timeout > > (da2:ahc0:0:5:0): SCB 0x24 - timed out in dataout phase, SEQADDR == 0x10f > > (da2:ahc0:0:5:0): BDR message in message buffer > > (da2:ahc0:0:5:0): SCB 0x92 - timed out in dataout phase, SEQADDR == 0x10f > > (da2:ahc0:0:5:0): no longer in timeout, status = 34b > > ahc0: Issued Channel A Bus Reset. 98 SCBs aborted > > Just another data point - A similar thing happened to me. I rebuilt the > kernel and world back in September and my previously happy SCSI system > started issuing the same type of messages. I saved the output of the > system log. Portions of it are listed below: > > Copyright (c) 1992-1999 FreeBSD Inc. > Copyright (c) 1982, 1986, 1989, 1991, 1993 > The Regents of the University of California. All rights > reserved. > FreeBSD 3.3-STABLE #3: Fri Sep 24 21:00:39 PDT 1999 > root@sloth:/usr/src/sys/compile/SLOTH > [...trim...] > ahc0: rev 0x00 int a irq 9 on pci0.9.0 > ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs > [...trim...] > Waiting 8 seconds for SCSI devices to settle > changing root device to da0s3a > da0 at ahc0 bus 0 target 15 lun 0 > da0: Fixed Direct Access SCSI-2 device > da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > Enabled > da0: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) > cd0 at ahc0 bus 0 target 0 lun 0 > cd0: Removable CD-ROM SCSI-2 device > cd0: 10.000MB/s transfers (10.000MHz, offset 8) > cd0: Attempt to query device size failed: NOT READY, Medium not present > cd1 at ahc0 bus 0 target 1 lun 0 > cd1: Removable CD-ROM SCSI-2 device > cd1: 3.300MB/s transfers > cd1: Attempt to query device size failed: NOT READY, Medium not present > > > Unexpected busfree. LASTPHASE == 0x1 > SEQADDR == 0x153 > ahc0:A:0: no active SCB for reconnecting target - issuing BUS DEVICE > RESET > SAVED_TCL == 0x0, ARG_1 == 0xff, SEQ_FLAGS == 0x0 > (cd0:ahc0:0:0:0): SCB 0x16 - timed out in datain phase, SEQADDR == 0x153 > (cd0:ahc0:0:0:0): Other SCB Timeout > (da0:ahc0:0:15:0): SCB 0x3 - timed out in datain phase, SEQADDR == 0x153 > (da0:ahc0:0:15:0): BDR message in message buffer > (da0:ahc0:0:15:0): SCB 0x3 - timed out in datain phase, SEQADDR == 0x153 > (da0:ahc0:0:15:0): no longer in timeout, status = 34b > ahc0: Issued Channel A Bus Reset. 2 SCBs aborted > fd0c: hard error reading fsbn 0 (No status) > > > The problem occurred while accessing the da0 device and cd0 device at > the same time. I could reproduce it at will, and almost instantly by > copying a file from the CD-ROM to the hard drive. I could not reproduce > the error with the older, slower NEC cd1 CD-ROM device. I rechecked all > my termination and unplugged one device after another without any > success. My guess was that the cd0 drive had gone goofy on me. The > only thing I have not tried is replacing the cables. Since I had the > other CD available my fix was to yank out the suspect device. It has > been near the bottom of my 'things to do' list. > > Maybe we both got bit by the same "fix" that uncovered hidden hardware > problems. Maybe not, it looks like you have problems with only Wide > channel devices. > > -- > -Ben Speirs > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message