From owner-freebsd-scsi Mon Apr 20 08:26:34 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id IAA13506 for freebsd-scsi-outgoing; Mon, 20 Apr 1998 08:26:34 -0700 (PDT) (envelope-from owner-freebsd-scsi@FreeBSD.ORG) Received: from kaori.communique.net (kaori.communique.net [204.27.67.55]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id PAA13338 for ; Mon, 20 Apr 1998 15:24:37 GMT (envelope-from rzig@verio.net) Received: by kaori.communique.net with Internet Mail Service (5.0.1458.49) id ; Mon, 20 Apr 1998 10:23:49 -0500 Message-ID: From: Raul Zighelboim To: "'scsi@freebsd.org'" Subject: RE: Help ! Scsi buss going down ! Date: Mon, 20 Apr 1998 10:23:46 -0500 X-Priority: 1 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.0.1458.49) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org This system seems to have suffer from a massive stroke! With lots of testing, we got to the same conclusion yesterday revision E has problems over load (run iozone and see the system freeze). I am running 3 revision D cards, but maybe one of them is defective. We will keep replacing cards. Unrelated , every time we reboot the server, we get an error message at reboot. It does not matter how clean the shutdown was: (sync; sync; sync; /sbin/umount -a; /sbin/shutdown -h now)... fsck complains at reboot: Cannot alloc 3317710 bytes for blockmap Cannot check file system .... running fsck manually will show a clean fs. Any idea on how I can fix this >? ================================================== Raul Zighelboim rzig@verio.net > -----Original Message----- > From: Greg Rowe [SMTP:greg@uswest.net] > Sent: Monday, April 20, 1998 8:17 AM > To: Raul Zighelboim; 'scsi@freebsd.org' > Subject: Re: Help ! Scsi buss going down ! > > Last week we discovered a problem with a number of our new systems > that > contained Adaptec 2940UW Revision E chips. All systems containing the > E > revision chips are failing under load with SCSI resets. The 'D' > revision works > fine in those systems. Our hardware vendor is trying to get some > information > out of Adaptec on this problem, but it almost looks like they > discovered a bug > in that version and yanked it. All Adaptec 2940's coming out of the > channels > now seem to be Revision 'D'. You can check dmesg to determine what > version you > have. > > ahc0 rev 1 int a irq 9 on > pci0:13 > > The "rev 1" is the E revision and "rev 0" is D. Dropping the transfer > rate to > 10mbs/sec on all your drives will also correct(hide) the problem. > > Greg > > > On Apr 18, 11:32am, Raul Zighelboim wrote: > > Subject: Help ! Scsi buss going down ! > > > > Hello there; I have replace the drives, I have replaced the > controller. > > I will replace the external cable, and switch from 'external active > > termination' to 'drive built in termination' for the scsi bus. > > It cannot be a driver/software issue, there are two busses involved, > and > > it is always the same one the on with the problem. > > > > Is tehre something I am missing ? > > > > I keep getting this on the console: > > > > sd3(ahc0:2:0): SCB 0x1 - timed out in dataout phase, SCSISIGI == > 0xe6 > > SEQADDR = 0x12e SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 > > Ordered Tag queued > > sd3(ahc0:2:0): SCB 0xc timedout while recovery in progress > > sd3(ahc0:2:0): SCB 0x5 timedout while recovery in progress > > sd3(ahc0:2:0): SCB 0xd timedout while recovery in progress > > sd4(ahc0:4:0): SCB 0xa timedout while recovery in progress > > sd3(ahc0:2:0): SCB 0x1 - timed out in dataout phase, SCSISIGI == > 0xe6 > > SEQADDR = 0x12e SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 > > sd3(ahc0:2:0): abort message in message buffer > > sd3(ahc0:2:0): SCB 0x1 - timed out in dataout phase, SCSISIGI == > 0xf6 > > SEQADDR = 0x12e SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 > > sd3(ahc0:2:0): no longer in timeout > > ahc0: Issued Channel A Bus Reset. 10 SCBs aborted > > sd0(ahc0:3:0): SCB 0x10 - timed out while idle, LASTPHASE == 0x1, > > SCSISIGI == 0x0 > > SEQADDR = 0x17c SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 > > Ordered Tag queued > > sd0(ahc0:3:0): SCB 0xf timedout while recovery in progress > > sd3(ahc0:2:0): SCB 0xd timedout while recovery in progress > > sd3(ahc0:2:0): SCB 0xc timedout while recovery in progress > > sd4(ahc0:4:0): SCB 0xa timedout while recovery in progress > > sd3(ahc0:2:0): SCB 0x5 timedout while recovery in progress > > sd0(ahc0:3:0): SCB 0x4 timedout while recovery in progress > > sd0(ahc0:3:0): SCB 0x2 timedout while recovery in progress > > sd3(ahc0:2:0): SCB 0x1 timedout while recovery in progress > > sd4(ahc0:4:0): SCB 0x0 timedout while recovery in progress > > sd0(ahc0:3:0): SCB 0x10 - timed out while idle, LASTPHASE == 0x1, > > SCSISIGI == 0x0 > > SEQADDR = 0x17c SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 > > sd0(ahc0:3:0): Queueing an Abort SCB > > sd0(ahc0:3:0): SCB 0x10 - timed out while idle, LASTPHASE == 0x1, > > SCSISIGI == 0x0 > > SEQADDR = 0x17c SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 > > sd0(ahc0:3:0): no longer in timeout > > ahc0: Issued Channel A Bus Reset. 11 SCBs aborted > > sd0(ahc0:3:0): UNIT ATTENTION asc:29,0 > > sd0(ahc0:3:0): Power on, reset, or bus device reset occurred field > > replaceable unit: 80 > > , retries:2 > > sd3(ahc0:2:0): UNIT ATTENTION asc:29,2 field replaceable unit: 2 > > , retries:2 > > sd9(ahc0:5:0): UNIT ATTENTION asc:29,2 field replaceable unit: 2 > > , retries:4 > > sd4(ahc0:4:0): UNIT ATTENTION asc:29,2 field replaceable unit: 2 > > , retries:2 > > sd1(ahc0:0:0): UNIT ATTENTION asc:29,2 field replaceable unit: 2 > > , retries:4 > > sd10(ahc0:6:0): UNIT ATTENTION asc:29,2 field replaceable unit: 2 > > , retries:4 > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > with "unsubscribe freebsd-scsi" in the body of the message > >-- End of excerpt from Raul Zighelboim > > > > -- > Greg Rowe US WEST - !NTERACT Internet Services > "To err is human, to really foul up requires the root password." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message