From owner-aic7xxx Sat Aug 8 12:12:11 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA14520 for aic7xxx-outgoing; Sat, 8 Aug 1998 12:12:11 -0700 (PDT) (envelope-from owner-aic7xxx@FreeBSD.ORG) Received: from athena.veritas.com (athena.veritas.com [192.203.46.191]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA14470 for ; Sat, 8 Aug 1998 12:11:52 -0700 (PDT) (envelope-from wojo@veritas.com) Received: from megami.veritas.com (megami.veritas.com [192.203.46.101]) by athena.veritas.com (8.9.0/8.9.0) with SMTP id MAA00713 for ; Sat, 8 Aug 1998 12:11:32 -0700 (PDT) Received: by megami.veritas.com (Smail3.1.29.0 #7) id m0z5EOo-00007LC; Sat, 8 Aug 98 12:11 PDT Message-Id: Date: Sat, 8 Aug 98 12:11 PDT From: wojo@veritas.com (Jack Woychowski) To: AIC7xxx@FreeBSD.ORG CC: wojo@veritas.com Subject: Timeouts and Resets with 5.X.X drivers - what to do? (newbie - sorry) Yow!-Zippy-Says: Now, let's SEND OUT for QUICHE!! Sender: owner-aic7xxx@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Ok, I've been hanging back for a few weeks reading over the mail on this list and searching around for answers, but haven't really seen any answers that seem to address this problem. My apologies as a newbie for probably asking something that's already being addressed; I think I joined the mailing list just after these started. (Flames for being lame can be addressed directly to me. :-) Details of system: HP Vectra XA w/64M memory, running linux 2.0.35 (lately) with pre-6 patch. I've got two Adaptec AHA-294X Ultra SCSI controllers, one wide the other narrow. (My wide one is new and currently unused; it's part of a Firewire card. Gory details of attached disks, etc. from /proc attached below.) Note that I also alternate-boot this system with NT 4.0 SP3 (yeah, I know, but it's for work :-); NT experiences no similar problems. I've been experiencing the same problems in all linux versions since 2.0.32 (and as a result have spend most of the time running 2.0.29). I believe this makes it a problem with the 5.X.X aic7xxx driver vs the 4.1.1 (from straight linux 2.0.29), which doesn't seem to have the same problem. Along with others, I've been experiencing the 'aborting due to timeout', resetting bus, 'trying harder' problems. For example, from dmesg (this occurred during a kernel build): scsi : aborting command due to timeout : pid 11051, scsi0, channel 0, id 3, lun 0 0x0a 16 55 3b 02 00 SCSI host 0 channel 0 reset (pid 11049) timed out - trying harder SCSI bus is being reset for host 0 channel 0. (scsi0:0:2:0) Synchronous at 20.0 Mbyte/sec, offset 15. (scsi0:0:3:0) Synchronous at 10.0 Mbyte/sec, offset 15. scsi : aborting command due to timeout : pid 11051, scsi0, channel 0, id 3, lun 0 0x0a 16 55 3b 02 00 SCSI host 0 abort (pid 11049) timed out - resetting SCSI bus is being reset for host 0 channel 0. (scsi0:0:2:0) Synchronous at 20.0 Mbyte/sec, offset 15. (scsi0:0:3:0) Synchronous at 10.0 Mbyte/sec, offset 15. SCSI host 0 abort (pid 11051) timed out - resetting SCSI bus is being reset for host 0 channel 0. SCSI host 0 abort (pid 11049) timed out - resetting SCSI bus is being reset for host 0 channel 0. (scsi0:0:2:0) Synchronous at 20.0 Mbyte/sec, offset 15. (scsi0:0:3:0) Synchronous at 10.0 Mbyte/sec, offset 15. SCSI host 0 channel 0 reset (pid 11051) timed out - trying harder SCSI bus is being reset for host 0 channel 0. (scsi0:0:2:0) Synchronous at 20.0 Mbyte/sec, offset 15. (scsi0:0:3:0) Synchronous at 10.0 Mbyte/sec, offset 15. [..] This seems to happen only under higher I/O loads - during fsck cycles during boot, or during kernel builds and the like. It seems to get cumulatively worse; for example, it "backs up" on kernel builds: the build runs along fine for a while, then there's one reset, ten seconds later another, two seconds later another, then they pile up on one another and the system is effectively totally bogged down. This problem originally showed up on only one disk which was my Linux root (target 0 - a really old, slow disk). More recently, I moved the root to a more modern drive (target 3) and now the problem shows up on targets 2 and 3, so I guess it's not just my slow (<4M/sec - bleck) drive. (Target 0 is now totally unused, but still attached.) So, just from being on the list a few weeks, I've seen others with this problem, or at least the same output; I just haven't seen a diagnosis, or at least not one that seems to apply to me. I don't believe I've got conflicting addresses or the like, or conflicting interrupts, or such. None of the "try this" advice that has passed by me via email has seemed applicable. I guess the newbie silly-rabbit-trix-are-for-kids question is: are things like tagged queueing and greater queue depths s'posed to help this problem? I've been avoiding them (not knowing exactly what they are or what they might do for me - yikes) but if they help, I'll try 'em. (BTW, feel free to (indignantly :-) point me to something to read to learn, rather than waste your time explaining them to me - I'm certainly willing to do that on my own, I just haven't found a source of info as of yet.) My problem is that I need to be running 2.0.34 or higher to start playing around with a firewire driver, but the timeout/abort/reset problems pretty much stop me (can't build the darn kernel). I guess one solution would be to use the 4.1.1 driver, but that doesn't seem to be supported for 2.0.30+ kernels. So: What do I do now? Any further debuggering I could be doing? Should I start reading source? :-) (actually, I have, and don't understand a whole lot of it - gotta find a design paper somewhere. :-) -- Woof ****************************************************************************** * Jack Woychowski \\\ _ // Kernel Hound * * VERITAS Software \\\ //\ // wojo@veritas.com * * 1600 Plymouth Street \\\ // \\ // oof!! VOICE: (650)335-8533 * * Mountain View, California 94043 \\\/ \\/ FAX: (650)335-8050 * * -------------------------------------------------------------------------- * * Join me in the League For Programming Freedom. Questions? Just ask me. * ****************************************************************************** * Yow!-Zippy-Says: I'm a GENIUS! I want to dispute sentence * * structure with SUSAN SONTAG!! * ****************************************************************************** GORY SYSTEM DETAILS ------------------- wahya:/lhome/wojo$ cat /proc/pci PCI devices found: Bus 0, device 12, function 0: SCSI storage controller: Adaptec AIC-7881U (rev 0). Medium devsel. Fast back-to-back capable. IRQ 10. Master Capable. Latency=64. Min Gnt=8.Max Lat=8. I/O at 0xfc00. Non-prefetchable 32 bit memory at 0xfedfb000. Bus 1, device 5, function 0: FireWire (IEEE 1394): Adaptec AIC-5800 (rev 16). Medium devsel. IRQ 9. Master Capable. Latency=64. Non-prefetchable 32 bit memory at 0xfecfec00. Bus 1, device 4, function 0: SCSI storage controller: Adaptec AIC-7881U (rev 1). Medium devsel. Fast back-to-back capable. IRQ 11. Master Capable. Latency=64. Min Gnt=8.Max Lat=8. I/O at 0xec00. Non-prefetchable 32 bit memory at 0xfecff000. Bus 0, device 10, function 0: PCI bridge: DEC DC21152 (rev 2). Medium devsel. Fast back-to-back capable. Master Capable. Latency=64. Min Gnt=4.Max Lat=2. Bus 0, device 6, function 0: VGA compatible controller: Matrox Millennium (rev 1). Medium devsel. Fast back-to-back capable. IRQ 9. Non-prefetchable 32 bit memory at 0xfedfc000. Prefetchable 32 bit memory at 0xfe000000. Bus 0, device 4, function 1: IDE interface: Intel 82371SB PIIX3 IDE (rev 0). Medium devsel. Fast back-to-back capable. Master Capable. Latency=32. I/O at 0x580. Bus 0, device 4, function 0: ISA bridge: Intel 82371SB PIIX3 ISA (rev 1). Medium devsel. Fast back-to-back capable. Master Capable. No bursts. Bus 0, device 0, function 0: Host bridge: Intel 82441FX Natoma (rev 2). Medium devsel. Fast back-to-back capable. Master Capable. Latency=32. ---------------------------------------------------------------------- wahya:/lhome/wojo$ cat /proc/scsi/aic7xxx/0 Adaptec AIC7xxx driver version: 5.1.0pre6/3.2.4 Compile Options: AIC7XXX_RESET_DELAY : 5 AIC7XXX_TAGGED_QUEUEING: Adapter Support Enabled Check below to see which devices use tagged queueing AIC7XXX_PAGE_ENABLE : Enabled (This is no longer an option) AIC7XXX_PROC_STATS : Enabled Adapter Configuration: SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter Ultra Narrow Controller PCI MMAPed I/O Base: 0xfedfb000 Adaptec SCSI BIOS: Enabled IRQ: 10 SCBs: Active 0, Max Active 2, Allocated 15, HW 16, Page 255 Interrupts: 16598 BIOS Control Word: 0x19b6 Adapter Control Word: 0x001b Extended Translation: Enabled Disconnect Enable Flags: 0x00ff Ultra Enable Flags: 0x00e7 Tag Queue Enable Flags: 0x0000 Ordered Queue Tag Flags: 0x0000 Default Tag Queue Depth: 8 Tagged Queue By Device array for aic7xxx host instance 0: {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255} Actual queue depth per device for aic7xxx host instance 0: {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} Statistics: (scsi0:0:0:0) Device using Narrow/Sync transfers at 20.0 MByte/sec, offset 15 Total transfers 3 (3 read;0 written) blks(512) rd=5; blks(512) wr=0 < 512 512-1K 1-2K 2-4K 4-8K 8-16K 16-32K 32-64K 64-128K >128K Reads: 0 1 2 0 0 0 0 0 0 0 Writes: 0 0 0 0 0 0 0 0 0 0 (scsi0:0:1:0) Device using Narrow/Async transfers. Total transfers 2 (2 read;0 written) blks(512) rd=3; blks(512) wr=0 < 512 512-1K 1-2K 2-4K 4-8K 8-16K 16-32K 32-64K 64-128K >128K Reads: 0 1 1 0 0 0 0 0 0 0 Writes: 0 0 0 0 0 0 0 0 0 0 (scsi0:0:2:0) Device using Narrow/Sync transfers at 20.0 MByte/sec, offset 15 Total transfers 7637 (6523 read;1114 written) blks(512) rd=68195; blks(512) wr=5864 < 512 512-1K 1-2K 2-4K 4-8K 8-16K 16-32K 32-64K 64-128K >128K Reads: 0 1 1748 526 1828 2243 146 28 3 0 Writes: 0 0 848 143 65 31 14 8 5 0 (scsi0:0:3:0) Device using Narrow/Sync transfers at 10.0 MByte/sec, offset 15 Total transfers 8329 (2558 read;5771 written) blks(512) rd=21299; blks(512) wr=14936 < 512 512-1K 1-2K 2-4K 4-8K 8-16K 16-32K 32-64K 64-128K >128K Reads: 0 1 1227 46 636 629 7 7 5 0 Writes: 0 0 4668 1068 30 1 0 0 4 0 (scsi0:0:4:0) Device using Narrow/Sync transfers at 10.0 MByte/sec, offset 15 Total transfers 2 (2 read;0 written) blks(512) rd=3; blks(512) wr=0 < 512 512-1K 1-2K 2-4K 4-8K 8-16K 16-32K 32-64K 64-128K >128K Reads: 0 1 1 0 0 0 0 0 0 0 Writes: 0 0 0 0 0 0 0 0 0 0 ---------------------------------------------------------------------- wahya:/lhome/wojo$ cat /proc/scsi/aic7xxx/1 Adaptec AIC7xxx driver version: 5.1.0pre6/3.2.4 Compile Options: AIC7XXX_RESET_DELAY : 5 AIC7XXX_TAGGED_QUEUEING: Adapter Support Enabled Check below to see which devices use tagged queueing AIC7XXX_PAGE_ENABLE : Enabled (This is no longer an option) AIC7XXX_PROC_STATS : Enabled Adapter Configuration: SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter Wide Controller PCI MMAPed I/O Base: 0xfecff000 Adaptec SCSI BIOS: Enabled IRQ: 11 SCBs: Active 0, Max Active 1, Allocated 15, HW 16, Page 255 Interrupts: 30 BIOS Control Word: 0x18b6 Adapter Control Word: 0x005d Extended Translation: Enabled Disconnect Enable Flags: 0xffff Tag Queue Enable Flags: 0x0000 Ordered Queue Tag Flags: 0x0000 Default Tag Queue Depth: 8 Tagged Queue By Device array for aic7xxx host instance 1: {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255} Actual queue depth per device for aic7xxx host instance 1: {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} ---------------------------------------------------------------------- wahya:/lhome/wojo$ cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: QUANTUM Model: VIKING 2.3 NSE Rev: 8808 Type: Direct-Access ANSI SCSI revision: 02 Host: scsi0 Channel: 00 Id: 01 Lun: 00 Vendor: CDC Model: 94181-15 Rev: 0293 Type: Direct-Access ANSI SCSI revision: 01 CCS Host: scsi0 Channel: 00 Id: 02 Lun: 00 Vendor: SEAGATE Model: ST32171N Rev: 0338 Type: Direct-Access ANSI SCSI revision: 02 Host: scsi0 Channel: 00 Id: 03 Lun: 00 Vendor: HP Model: C3324A Rev: 5020 Type: Direct-Access ANSI SCSI revision: 02 Host: scsi0 Channel: 00 Id: 04 Lun: 00 Vendor: iomega Model: jaz 1GB Rev: J.83 Type: Direct-Access ANSI SCSI revision: 02 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe aic7xxx" in the body of the message