Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Dec 1997 14:42:08 -0800 (PST)
From:      "Bryn Wm. Moslow" <bryn@nwlink.com>
To:        freebsd-isp@freebsd.org
Cc:        jayk@nwlink.com
Subject:   Adaptec 2940/Seagate Failures
Message-ID:  <Pine.GSO.3.95.971205120024.28222B-100000@utah>

next in thread | raw e-mail | index | archive | help
Hello, sorry that this is a bit windy but I'm desperate:

I'm still having big problems with FreeBSD, the Adaptec 2940UW, and
Seagate Drives. When the system gets heavily loaded (i.e. 65-75 sendmail
processes, 20 or so poppers,) often it comes to a complete stop and sure
enough I can get to the console and discover just about the same thing
every time (which is at the bottom of this message along with my dmesg
output for informational purposes.) I've tried FreeBSD's both 2.2.2 and
2.2.5, sendmail 8.8.5, 8.8.6, 8.8.7, 8.8.8, qpopper 2.3, 2.4. I would like
to note that disk I/O was much smoother and system load was significantly
lower with 2.2.5 but it was only two hours under load before it pooped the
first time as opposed to a couple days under 2.2.2. In fact, it was odd
because the load was 0.7 and iostat was about 3800sps on sd2 average when
the most recent death (see 'the hell' below) occured.

I've been reading the long debate about the 2940 and FreeBSD for some time
and just today went through my whole archive of freebsd-isp and noted some
things. What especially stands out is the number of people saying that
they have no problem, "it works great," and then noting that they're not
really using it or only have a tape attached, etc., literally in the same
breath.  The people who DO seem to be having problems are running the 2940
under heavy load conditions and having to power cycle servers at horrible
times like myself. If you have an archive of freebsd-isp do a search on
"Adaptec 2940" and you'll see what I'm talking about. Just an observation
and opinion: I'm not trying to PO anyone but I think there has to be more
attention paid to the stability of the SCSI subsystem, specifically under
heavy loads. Once again: I love you all, I love Chuck, please help me ;).

Notes:  

- I've used every combination of AHC_TAGENABLE, AHC_SCBPAGING_ENABLE, and
AHC_ALLOW_MEMIO in the kernel possible and each one in cooperation with
the others or on its own eventually brings down the system. 

- We've broken out the mail spool for local mail to a directory structure
based on the first letters of username such as: /var/mail/u/us/username.
This has helped overall but iostat still hits the roof when people get
lots of mail (lists, spam) and pop3 is yanking down large mail files.

- per advice from other FreeBSD users and non-FreeBSD users and an
electrical engineer, the narrow drive on a separate controller from the
wide drives.

- The drives are all internal, the bus is terminated and the cable is only
0.5m. 

- The controller is in Ultra mode and I would like to keep it there if
possible. I've tried it without to no avail anyway. I'm quite sure this
should not be a problem as I have a BSDI 3.0 box running news that does at
least ten times the I/O per day at a higher load on two Adaptec 2940's and
10 drives in Ultra mode with a 4-disk ccd (sp0 in BSDI) with the same CPU,
but I want to believe in the power of FreeBSD. :) 

The hell: (This particular kernel was with AHC_SCBPAGING_ENABLE but I get 
similar results with the other options and ultimately a bus failure
and/or lockup and/or panic. I don't have as much trouble with 
no extra ahc options but the system gets VERY s-l-o-w under load.)

SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued \M^?\^OA Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): abort message in message buffer
sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6
SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3
sd2(ahc1:2:0): no longer in timeout
ahc1: Issued Channel A Bus Reset. 4 SCBs aborted

dmesg output: (No ahc kernel options)

FreeBSD 2.2.2-RELEASE #0: Wed Nov 19 13:31:09 PST 1997
    bryn@alabama.nwlink.com:/usr/src/sys/compile/ALABAMA
CPU: Pentium Pro (199.43-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x619  Stepping=9
  Features=0xfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,<b11>,MTRR,PGE,MCA,CMOV>
real memory  = 268435456 (262144K bytes)
avail memory = 257245184 (251216K bytes)
Probing for devices on PCI bus 0:
chip0 <Intel 82440FX (Natoma) PCI and memory controller> rev 2 on pci0:0
chip1 <Intel 82371SB PCI-ISA bridge> rev 1 on pci0:1:0
chip2 <Intel 82371SB IDE interface> rev 0 on pci0:1:1
vx0 <3COM 3C905 Fast Etherlink XL PCI> rev 0 int a irq 12 on pci0:9
mii[*mii*] address 00:60:08:0a:42:32
ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 10 on pci0:10
ahc0: aic7880 Wide Channel, SCSI Id=7, 16 SCBs
ahc0 waiting for scsi devices to settle
(ahc0:0:0): "SEAGATE ST52160N 0285" type 0 fixed SCSI 2
sd0(ahc0:0:0): Direct-Access 2069MB (4238282 512 byte sectors)
vga0 <VGA-compatible display device> rev 0 on pci0:11
ahc1 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 11 on pci0:12
ahc1: aic7880 Wide Channel, SCSI Id=7, 16 SCBs
ahc1 waiting for scsi devices to settle
(ahc1:1:0): "SEAGATE ST34572W 0718" type 0 fixed SCSI 2
sd1(ahc1:1:0): Direct-Access 4340MB (8888924 512 byte sectors)
(ahc1:2:0): "SEAGATE ST34572W 0784" type 0 fixed SCSI 2
sd2(ahc1:2:0): Direct-Access 4340MB (8888924 512 byte sectors)
Probing for devices on the ISA bus:
sc0 at 0x60-0x6f irq 1 on motherboard
sc0: VGA color <16 virtual consoles, flags=0x0>
sio0 at 0x3f8-0x3ff irq 4 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
lpt0 at 0x378-0x37f irq 7 on isa
lpt0: Interrupt-driven port
lp0: TCP/IP capable interface
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: NEC 72065B
fd0: 1.44MB 3.5in
npx0 flags 0x1 on motherboard
npx0: INT 16 interface
WARNING: / was not properly dismounted.

Thanks for your time,
Bryn




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.3.95.971205120024.28222B-100000>