Date: Thu, 9 Dec 1999 11:56:25 -0400 (AST) From: The Hermit Hacker <scrappy@hub.org> To: Ben Speirs <igiveup@ix.netcom.com> Cc: freebsd-scsi@freebsd.org, freebsd-stable@freebsd.org Subject: Re: SCSI problem ... OS or just bus? Message-ID: <Pine.BSF.4.21.9912091154270.500-100000@thelab.hub.org> In-Reply-To: <384F3A52.23868C19@ix.netcom.com>
next in thread | previous in thread | raw e-mail | index | archive | help
As an update, so far...without turning news back on again, and after
upgrading the kernel and doing a make world 12hrs ago, things have been
stable *so far*...the first time this happened after we added the drives,
it took about 17hrs or so...subsequent ones generally took 2-4hrs...
I'm going to re-enable news this afternoon and see if adding that extra
thrashing to the system causes a repeat of the problem or not...
On Wed, 8 Dec 1999, Ben Speirs wrote:
> The Hermit Hacker wrote:
> >
> > I recently did two upgrades in the course of a few days...upgraded my
> > 3.3-STABLE to a more recent version, and added hard drives onto the
> > system...now I'm getting SCSI problems that make no sense :(
> >
> > The machine just hung once more, which its doing every few hours...I can
> > get down to the debugger, but a 'trace' doesn't appear to show anyting, so
> > I panic...
> >
> > ==========
> > (da4:ahc0:0:8:0): Other SCB Timeout
> > (da4:ahc0:0:8:0): SCB 0xeb - timed out in dataout phase, SEQADDR == 0x10f
> > (da4:ahc0:0:8:0): Other SCB Timeout
> > (da2:ahc0:0:5:0): SCB 0x24 - timed out in dataout phase, SEQADDR == 0x10f
> > (da2:ahc0:0:5:0): BDR message in message buffer
> > (da2:ahc0:0:5:0): SCB 0x92 - timed out in dataout phase, SEQADDR == 0x10f
> > (da2:ahc0:0:5:0): no longer in timeout, status = 34b
> > ahc0: Issued Channel A Bus Reset. 98 SCBs aborted
>
> Just another data point - A similar thing happened to me. I rebuilt the
> kernel and world back in September and my previously happy SCSI system
> started issuing the same type of messages. I saved the output of the
> system log. Portions of it are listed below:
>
> Copyright (c) 1992-1999 FreeBSD Inc.
> Copyright (c) 1982, 1986, 1989, 1991, 1993
> The Regents of the University of California. All rights
> reserved.
> FreeBSD 3.3-STABLE #3: Fri Sep 24 21:00:39 PDT 1999
> root@sloth:/usr/src/sys/compile/SLOTH
> [...trim...]
> ahc0: <Adaptec 2940 Ultra SCSI adapter> rev 0x00 int a irq 9 on pci0.9.0
> ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
> [...trim...]
> Waiting 8 seconds for SCSI devices to settle
> changing root device to da0s3a
> da0 at ahc0 bus 0 target 15 lun 0
> da0: <FUJITSU M2954Q-512 0142> Fixed Direct Access SCSI-2 device
> da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing
> Enabled
> da0: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C)
> cd0 at ahc0 bus 0 target 0 lun 0
> cd0: <TOSHIBA CD-ROM XM-5701TA 3136> Removable CD-ROM SCSI-2 device
> cd0: 10.000MB/s transfers (10.000MHz, offset 8)
> cd0: Attempt to query device size failed: NOT READY, Medium not present
> cd1 at ahc0 bus 0 target 1 lun 0
> cd1: <NEC CD-ROM DRIVE:500 2.5> Removable CD-ROM SCSI-2 device
> cd1: 3.300MB/s transfers
> cd1: Attempt to query device size failed: NOT READY, Medium not present
>
>
> Unexpected busfree. LASTPHASE == 0x1
> SEQADDR == 0x153
> ahc0:A:0: no active SCB for reconnecting target - issuing BUS DEVICE
> RESET
> SAVED_TCL == 0x0, ARG_1 == 0xff, SEQ_FLAGS == 0x0
> (cd0:ahc0:0:0:0): SCB 0x16 - timed out in datain phase, SEQADDR == 0x153
> (cd0:ahc0:0:0:0): Other SCB Timeout
> (da0:ahc0:0:15:0): SCB 0x3 - timed out in datain phase, SEQADDR == 0x153
> (da0:ahc0:0:15:0): BDR message in message buffer
> (da0:ahc0:0:15:0): SCB 0x3 - timed out in datain phase, SEQADDR == 0x153
> (da0:ahc0:0:15:0): no longer in timeout, status = 34b
> ahc0: Issued Channel A Bus Reset. 2 SCBs aborted
> fd0c: hard error reading fsbn 0 (No status)
>
>
> The problem occurred while accessing the da0 device and cd0 device at
> the same time. I could reproduce it at will, and almost instantly by
> copying a file from the CD-ROM to the hard drive. I could not reproduce
> the error with the older, slower NEC cd1 CD-ROM device. I rechecked all
> my termination and unplugged one device after another without any
> success. My guess was that the cd0 drive had gone goofy on me. The
> only thing I have not tried is replacing the cables. Since I had the
> other CD available my fix was to yank out the suspect device. It has
> been near the bottom of my 'things to do' list.
>
> Maybe we both got bit by the same "fix" that uncovered hidden hardware
> problems. Maybe not, it looks like you have problems with only Wide
> channel devices.
>
> --
> -Ben Speirs
>
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.9912091154270.500-100000>
