Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Apr 1998 10:23:46 -0500
From:      Raul Zighelboim <rzig@verio.net>
To:        "'scsi@freebsd.org'" <scsi@FreeBSD.ORG>
Subject:   RE: Help ! Scsi buss going down !
Message-ID:  <A03CD00C69B1D01195AB00A024ECEB167F9DC6@kaori.communique.net>

next in thread | raw e-mail | index | archive | help

This system seems to have suffer from a massive stroke!

With lots of testing, we got to the same conclusion yesterday revision E
has problems over load (run iozone and see the system freeze).  I am
running 3 revision D cards, but maybe one of them is defective.  We will
keep replacing cards.

Unrelated , every time we reboot the server, we get an error message at
reboot.  It does not matter how clean the shutdown was: (sync; sync;
sync; /sbin/umount -a; /sbin/shutdown -h now)...

fsck complains at reboot:
Cannot alloc 3317710 bytes for blockmap
Cannot check file system
....
running fsck manually will show a clean fs.

Any idea on how I can fix this >?

==================================================
Raul Zighelboim       		  rzig@verio.net

> -----Original Message-----
> From:	Greg Rowe [SMTP:greg@uswest.net]
> Sent:	Monday, April 20, 1998 8:17 AM
> To:	Raul Zighelboim; 'scsi@freebsd.org'
> Subject:	Re: Help ! Scsi buss going down !
> 
> Last week we discovered a problem with a number of our new systems
> that
> contained Adaptec 2940UW Revision E chips. All systems containing the
> E
> revision chips are failing under load with SCSI resets. The 'D'
> revision works
> fine in those systems. Our hardware vendor is trying to get some
> information
> out of Adaptec on this problem, but it almost looks like they
> discovered a bug
> in that version and yanked it. All Adaptec 2940's coming out of the
> channels
> now seem to be Revision 'D'. You can check dmesg to determine what
> version you
> have.
> 
>  ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 1 int a irq 9 on
> pci0:13
> 
> The "rev 1" is the E revision and "rev 0" is D. Dropping the transfer
> rate to
> 10mbs/sec on all your drives will also correct(hide) the problem.
> 
> Greg
> 
> 
> On Apr 18, 11:32am, Raul Zighelboim wrote:
> > Subject: Help ! Scsi buss going down !
> >
> > Hello there; I have replace the drives, I have replaced the
> controller.
> > I will replace the external cable, and switch from 'external active
> > termination' to 'drive built in termination' for the scsi bus.
> > It cannot be a driver/software issue, there are two busses involved,
> and
> > it is always the same one the on with the problem.
> >
> > Is tehre something I am missing ?
> >
> > I keep getting this on the console:
> >
> > sd3(ahc0:2:0): SCB 0x1 - timed out in dataout phase, SCSISIGI ==
> 0xe6
> > SEQADDR = 0x12e SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
> > Ordered Tag queued
> > sd3(ahc0:2:0): SCB 0xc timedout while recovery in progress
> > sd3(ahc0:2:0): SCB 0x5 timedout while recovery in progress
> > sd3(ahc0:2:0): SCB 0xd timedout while recovery in progress
> > sd4(ahc0:4:0): SCB 0xa timedout while recovery in progress
> > sd3(ahc0:2:0): SCB 0x1 - timed out in dataout phase, SCSISIGI ==
> 0xe6
> > SEQADDR = 0x12e SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
> > sd3(ahc0:2:0): abort message in message buffer
> > sd3(ahc0:2:0): SCB 0x1 - timed out in dataout phase, SCSISIGI ==
> 0xf6
> > SEQADDR = 0x12e SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
> > sd3(ahc0:2:0): no longer in timeout
> > ahc0: Issued Channel A Bus Reset. 10 SCBs aborted
> > sd0(ahc0:3:0): SCB 0x10 - timed out while idle, LASTPHASE == 0x1,
> > SCSISIGI == 0x0
> > SEQADDR = 0x17c SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
> > Ordered Tag queued
> > sd0(ahc0:3:0): SCB 0xf timedout while recovery in progress
> > sd3(ahc0:2:0): SCB 0xd timedout while recovery in progress
> > sd3(ahc0:2:0): SCB 0xc timedout while recovery in progress
> > sd4(ahc0:4:0): SCB 0xa timedout while recovery in progress
> > sd3(ahc0:2:0): SCB 0x5 timedout while recovery in progress
> > sd0(ahc0:3:0): SCB 0x4 timedout while recovery in progress
> > sd0(ahc0:3:0): SCB 0x2 timedout while recovery in progress
> > sd3(ahc0:2:0): SCB 0x1 timedout while recovery in progress
> > sd4(ahc0:4:0): SCB 0x0 timedout while recovery in progress
> > sd0(ahc0:3:0): SCB 0x10 - timed out while idle, LASTPHASE == 0x1,
> > SCSISIGI == 0x0
> > SEQADDR = 0x17c SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
> > sd0(ahc0:3:0): Queueing an Abort SCB
> > sd0(ahc0:3:0): SCB 0x10 - timed out while idle, LASTPHASE == 0x1,
> > SCSISIGI == 0x0
> > SEQADDR = 0x17c SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
> > sd0(ahc0:3:0): no longer in timeout
> > ahc0: Issued Channel A Bus Reset. 11 SCBs aborted
> > sd0(ahc0:3:0): UNIT ATTENTION asc:29,0
> > sd0(ahc0:3:0):  Power on, reset, or bus device reset occurred field
> > replaceable unit: 80
> > , retries:2
> > sd3(ahc0:2:0): UNIT ATTENTION asc:29,2  field replaceable unit: 2
> > , retries:2
> > sd9(ahc0:5:0): UNIT ATTENTION asc:29,2  field replaceable unit: 2
> > , retries:4
> > sd4(ahc0:4:0): UNIT ATTENTION asc:29,2  field replaceable unit: 2
> > , retries:2
> > sd1(ahc0:0:0): UNIT ATTENTION asc:29,2  field replaceable unit: 2
> > , retries:4
> > sd10(ahc0:6:0): UNIT ATTENTION asc:29,2  field replaceable unit: 2
> > , retries:4
> >
> > To Unsubscribe: send mail to majordomo@FreeBSD.org
> > with "unsubscribe freebsd-scsi" in the body of the message
> >-- End of excerpt from Raul Zighelboim
> 
> 
> 
> -- 
> Greg Rowe <greg@uswest.net>   US WEST - !NTERACT Internet Services
>  "To err is human, to really foul up requires the root password."

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A03CD00C69B1D01195AB00A024ECEB167F9DC6>