Date: Wed, 23 May 2001 19:49:01 -0400 (EDT) From: Joe Clarke <marcus@miami.edu> To: Matt Dillon <dillon@earth.backplane.com> Cc: "Justin T. Gibbs" <gibbs@scsiguy.com>, stable@FreeBSD.ORG Subject: Re: Continuing ahc problems - also cause fxp failure Message-ID: <Pine.OSF.4.31.0105231947350.28125-100000@jaguar.ir.miami.edu> In-Reply-To: <200105220925.f4M9Paf00409@earth.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Matt, I know you've probably heard a lot of responses to this, but I have a system very similar to yours that I use in production with vinum striping across 12 18 GB drives hooked up to an Adaptec U2W. I ran into this problem, and I upgraded the BIOS on the card. As soon as I did that, everything runs beautifully. Joe Clarke On Tue, 22 May 2001, Matt Dillon wrote: > This is getting weirder and weirder. > > 4.2 or 4.3-RC > AHC failure once or twice a month (as previously posted last month) > FXP Ethernet (appeared to be) working perfectly > > RELENG_4 (After Justin's adaptec fix) > FXP failure one week (old FXP driver) > FXP failure the next week (old FXP driver) > FXP *and* AHC failures tonight (new FXP driver) > > > What I got tonight was basically a system lockup with the kernel > generating console messages every few seconds from both the FXP > and the AHC drivers. I *was* able to break into the debugger, but > with ahc dead I couldn't generate a core. I think the system itself > is fine and the problem is somewhere in the AHC or FXP drivers. > > I had failures with the old FXP driver as well as the new, and the > old driver hasn't changed in months so the problem is either a PCI > bug (cycle timer issues?) or there are still AHC bugs. > > Note the time. Not fun, but at least I managed to play with the console > before someone else came in and rebooted the system :-) > > dmesg output is at the end. Here is what I was seeing on the console: > > fxp0: SCB timeout: 0xe0, 0, 0x90, 0x400 > (other SCB timeout messages) > fxp0: DMA timeout > fxp0: command queue timeout > fxp0: device timeout > ... various repetitions > > ahc0: issued channel A bus reset, 4 SCB's aborted > pci error interrupt at seqaddr 2 > scb 0x40 timed out while IDLE seqaddr 0x181 > > stack 0x17e, e, e, e > SXFRCTL0 = 0x80 > Dumping card state: SCSISEQ = 0x12, SBLKCTL = 0xA, SSTAT0 = 0x0, > SCB Count = 250 > > Kernel NEXTQSCB = 17 > Card NEXTQSCB = 64 > > (I squiggled this down from the console so it is not an > exact representation, but I think I got the meat). > > As I said, I was able to break into the debugger and apart from ahc > and fxp being completely failed, nothing else was wrong. > > The failure occured during the nightly dump. The network was > under a medium load (the backup is running over a T1) and the hard > drives were probably under a heavy load. All previous failures seemed > to have occured in the wee hours of the morning during our nightly > dumps. The disks do not have an appreciable load during the day. > > -Matt > > Here is my dmesg output: > > Copyright (c) 1992-2001 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD 4.3-STABLE #2: Fri May 18 11:36:08 PDT 2001 > dillon@ns1.backplane.com:/usr/src/sys/compile/EARTH > Timecounter "i8254" frequency 1193182 Hz > CPU: Pentium III/Pentium III Xeon/Celeron (531.65-MHz 686-class CPU) > Origin = "GenuineIntel" Id = 0x681 Stepping = 1 > Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> > real memory = 536862720 (524280K bytes) > avail memory = 519012352 (506848K bytes) > Preloaded elf kernel "kernel" at 0xc0350000. > Pentium Pro MTRR support enabled > md0: Malloc disk > npx0: <math processor> on motherboard > npx0: INT 16 interface > pcib0: <ServerWorks NB6635 3.0LE host to PCI bridge> on motherboard > pci0: <PCI bus> on pcib0 > pcib1: <PCI to PCI bridge (vendor=8086 device=0962)> at device 2.0 on pci0 > pci1: <PCI bus> on pcib1 > ahc0: <Adaptec aic7890/91 Ultra2 SCSI adapter> port 0xfc00-0xfcff mem 0xfcfff000-0xfcffffff irq 14 at device 4.0 on pci1 > aic7890/91: Wide Channel A, SCSI Id=7, 32/255 SCBs > ahc1: <Adaptec aic7880 Ultra SCSI adapter> port 0xf800-0xf8ff mem 0xfcffe000-0xfcffefff irq 10 at device 6.0 on pci1 > aic7880: Single Channel A, SCSI Id=7, 16/255 SCBs > fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0xecc0-0xecff mem 0xfe000000-0xfe0fffff,0xfe101000-0xfe101fff irq 11 at device 8.0 on pci0 > fxp0: Ethernet address 00:b0:d0:22:fb:03 > inphy0: <i82555 10/100 media interface> on miibus0 > inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > pci0: <ATI model 4759 graphics accelerator> at 14.0 > isab0: <ServerWorks IB6566 PCI to ISA bridge> at device 15.0 on pci0 > isa0: <ISA bus> on isab0 > pcib2: <ServerWorks NB6635 3.0LE host to PCI bridge> on motherboard > pci2: <PCI bus> on pcib2 > fxp1: <Intel Pro 10/100B/100+ Ethernet> port 0xdcc0-0xdcff mem 0xf6100000-0xf61fffff,0xf6201000-0xf6201fff irq 5 at device 6.0 on pci2 > fxp1: Ethernet address 00:d0:b7:7e:75:c3 > inphy1: <i82555 10/100 media interface> on miibus1 > inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > fxp2: <Intel Pro 10/100B/100+ Ethernet> port 0xdc80-0xdcbf mem 0xf6000000-0xf60fffff,0xf6200000-0xf6200fff irq 14 at device 8.0 on pci2 > fxp2: Ethernet address 00:d0:b7:7e:77:31 > inphy2: <i82555 10/100 media interface> on miibus2 > inphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 > fdc0: FIFO enabled, 8 bytes threshold > fd0: <1440-KB 3.5" drive> on fdc0 drive 0 > atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 > atkbd0: <AT Keyboard> irq 1 on atkbdc0 > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > sc0: <System console> on isa0 > sc0: VGA <16 virtual consoles, flags=0x200> > sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 > sio0: type 16550A > sio1 at port 0x2f8-0x2ff irq 3 on isa0 > sio1: type 16550A > IP packet filtering initialized, divert enabled, rule-based forwarding enabled, default to deny, logging disabled > IPsec: Initialized Security Association Processing. > IP Filter: v3.4.16 initialized. Default = pass all, Logging = disabled > Waiting 5 seconds for SCSI devices to settle > pass4 at ahc0 bus 0 target 6 lun 0 > pass4: <DELL 1x4 U2W SCSI BP 5.35> Fixed Processor SCSI-2 device > pass4: 3.300MB/s transfers > da2 at ahc0 bus 0 target 2 lun 0 > da2: <QUANTUM ATLAS V 36 SCA 0201> Fixed Direct Access SCSI-3 device > da2: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled > da2: 34732MB (71132998 512 byte sectors: 255H 63S/T 4427C) > da3 at ahc0 bus 0 target 3 lun 0 > da3: <QUANTUM ATLAS V 9 SCA 0201> Fixed Direct Access SCSI-3 device > da3: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled > da3: 8683MB (17783249 512 byte sectors: 255H 63S/T 1106C) > da0 at ahc0 bus 0 target 0 lun 0 > da0: <SEAGATE ST336704LC 0004> Fixed Direct Access SCSI-3 device > da0: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled > da0: 34732MB (71132960 512 byte sectors: 255H 63S/T 4427C) > da1 at ahc0 bus 0 target 1 lun 0 > da1: <SEAGATE ST336704LC 0004> Fixed Direct Access SCSI-3 device > da1: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled > da1: 34732MB (71132960 512 byte sectors: 255H 63S/T 4427C) > cd0 at ahc1 bus 0 target 5 lun 0 > cd0: <NEC CD-ROM DRIVE:466 1.06> Removable CD-ROM SCSI-2 device > cd0: 20.000MB/s transfers (20.000MHz, offset 15) > cd0: Attempt to query device size failed: NOT READY, Medium not present > Mounting root from ufs:/dev/da0s1a > WARNING: / was not properly dismounted > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.OSF.4.31.0105231947350.28125-100000>