Date: Fri, 5 Dec 1997 14:42:08 -0800 (PST) From: "Bryn Wm. Moslow" <bryn@nwlink.com> To: freebsd-isp@freebsd.org Cc: jayk@nwlink.com Subject: Adaptec 2940/Seagate Failures Message-ID: <Pine.GSO.3.95.971205120024.28222B-100000@utah>
next in thread | raw e-mail | index | archive | help
Hello, sorry that this is a bit windy but I'm desperate: I'm still having big problems with FreeBSD, the Adaptec 2940UW, and Seagate Drives. When the system gets heavily loaded (i.e. 65-75 sendmail processes, 20 or so poppers,) often it comes to a complete stop and sure enough I can get to the console and discover just about the same thing every time (which is at the bottom of this message along with my dmesg output for informational purposes.) I've tried FreeBSD's both 2.2.2 and 2.2.5, sendmail 8.8.5, 8.8.6, 8.8.7, 8.8.8, qpopper 2.3, 2.4. I would like to note that disk I/O was much smoother and system load was significantly lower with 2.2.5 but it was only two hours under load before it pooped the first time as opposed to a couple days under 2.2.2. In fact, it was odd because the load was 0.7 and iostat was about 3800sps on sd2 average when the most recent death (see 'the hell' below) occured. I've been reading the long debate about the 2940 and FreeBSD for some time and just today went through my whole archive of freebsd-isp and noted some things. What especially stands out is the number of people saying that they have no problem, "it works great," and then noting that they're not really using it or only have a tape attached, etc., literally in the same breath. The people who DO seem to be having problems are running the 2940 under heavy load conditions and having to power cycle servers at horrible times like myself. If you have an archive of freebsd-isp do a search on "Adaptec 2940" and you'll see what I'm talking about. Just an observation and opinion: I'm not trying to PO anyone but I think there has to be more attention paid to the stability of the SCSI subsystem, specifically under heavy loads. Once again: I love you all, I love Chuck, please help me ;). Notes: - I've used every combination of AHC_TAGENABLE, AHC_SCBPAGING_ENABLE, and AHC_ALLOW_MEMIO in the kernel possible and each one in cooperation with the others or on its own eventually brings down the system. - We've broken out the mail spool for local mail to a directory structure based on the first letters of username such as: /var/mail/u/us/username. This has helped overall but iostat still hits the roof when people get lots of mail (lists, spam) and pop3 is yanking down large mail files. - per advice from other FreeBSD users and non-FreeBSD users and an electrical engineer, the narrow drive on a separate controller from the wide drives. - The drives are all internal, the bus is terminated and the cable is only 0.5m. - The controller is in Ultra mode and I would like to keep it there if possible. I've tried it without to no avail anyway. I'm quite sure this should not be a problem as I have a BSDI 3.0 box running news that does at least ten times the I/O per day at a higher load on two Adaptec 2940's and 10 drives in Ultra mode with a 4-disk ccd (sp0 in BSDI) with the same CPU, but I want to believe in the power of FreeBSD. :) The hell: (This particular kernel was with AHC_SCBPAGING_ENABLE but I get similar results with the other options and ultimately a bus failure and/or lockup and/or panic. I don't have as much trouble with no extra ahc options but the system gets VERY s-l-o-w under load.) SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued \M^?\^OA Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): abort message in message buffer sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 sd2(ahc1:2:0): no longer in timeout ahc1: Issued Channel A Bus Reset. 4 SCBs aborted dmesg output: (No ahc kernel options) FreeBSD 2.2.2-RELEASE #0: Wed Nov 19 13:31:09 PST 1997 bryn@alabama.nwlink.com:/usr/src/sys/compile/ALABAMA CPU: Pentium Pro (199.43-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x619 Stepping=9 Features=0xfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,<b11>,MTRR,PGE,MCA,CMOV> real memory = 268435456 (262144K bytes) avail memory = 257245184 (251216K bytes) Probing for devices on PCI bus 0: chip0 <Intel 82440FX (Natoma) PCI and memory controller> rev 2 on pci0:0 chip1 <Intel 82371SB PCI-ISA bridge> rev 1 on pci0:1:0 chip2 <Intel 82371SB IDE interface> rev 0 on pci0:1:1 vx0 <3COM 3C905 Fast Etherlink XL PCI> rev 0 int a irq 12 on pci0:9 mii[*mii*] address 00:60:08:0a:42:32 ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 10 on pci0:10 ahc0: aic7880 Wide Channel, SCSI Id=7, 16 SCBs ahc0 waiting for scsi devices to settle (ahc0:0:0): "SEAGATE ST52160N 0285" type 0 fixed SCSI 2 sd0(ahc0:0:0): Direct-Access 2069MB (4238282 512 byte sectors) vga0 <VGA-compatible display device> rev 0 on pci0:11 ahc1 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 11 on pci0:12 ahc1: aic7880 Wide Channel, SCSI Id=7, 16 SCBs ahc1 waiting for scsi devices to settle (ahc1:1:0): "SEAGATE ST34572W 0718" type 0 fixed SCSI 2 sd1(ahc1:1:0): Direct-Access 4340MB (8888924 512 byte sectors) (ahc1:2:0): "SEAGATE ST34572W 0784" type 0 fixed SCSI 2 sd2(ahc1:2:0): Direct-Access 4340MB (8888924 512 byte sectors) Probing for devices on the ISA bus: sc0 at 0x60-0x6f irq 1 on motherboard sc0: VGA color <16 virtual consoles, flags=0x0> sio0 at 0x3f8-0x3ff irq 4 on isa sio0: type 16550A sio1 at 0x2f8-0x2ff irq 3 on isa sio1: type 16550A lpt0 at 0x378-0x37f irq 7 on isa lpt0: Interrupt-driven port lp0: TCP/IP capable interface fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: NEC 72065B fd0: 1.44MB 3.5in npx0 flags 0x1 on motherboard npx0: INT 16 interface WARNING: / was not properly dismounted. Thanks for your time, Bryn
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.3.95.971205120024.28222B-100000>