Date: Sat, 15 Nov 1997 22:10:52 -0700 From: "Justin T. Gibbs" <gibbs@plutotech.com> To: harold barker Hbarker <hbarker@rhiannon.sm.dsms.com> Cc: hackers@FreeBSD.org, scsi@FreeBSD.org, aic7xxx@FreeBSD.org Subject: Re: AHC / SCSI UPDATE Message-ID: <199711160511.WAA24691@pluto.plutotech.com>
next in thread | raw e-mail | index | archive | help
Sorry for not responding sooner, but I don't read this list regularly anymore... >If the person responsible for the code in question will email me, i will >ship/open for login a machine that exibits the broblem. That would be me, but I do believe that I have a system here that exibits the same problem you are having. When I have a fix for this machine, I might take you up on your offer if it doesn't seem to work with your equipment. Here's a little info about what we (Ken Merry and myself) have determined about the problem so far. System: P6-233 256k cache 2940UW (SCSI ID 7) 1 X PLEXTOR CD-ROM PX-4XCS 1.04 (SCSI ID 4) 2 X QUANTUM XP34550W LXY4 (SCSI IDs 0 and 1) How to repeat: run concurrent I/O to all 3 devices at the same time. Symptom: After a varying period of time, disk 0 or 1 stops performing reselections for it's outstanding I/O. This eventually results in a timeout, usually with the controller in an "idle" state. Using a SCSI bus analyzer, we've looked at the transactions on the bus that lead up to this state. No protocol errors were discovered. What we did find, however, was a disturbing pattern of disconnections and reconnections from the CDROM drive. The plextor seems to perform disconnections "often enough" to allow other targets to perform a reselection, but unfortunately seems to partake in the next arbitration phase if it has a task to continue. Since the arbitration algorithm breaks "ties" based on the SCSI ID (from highest to lowest priority 7->0, 15->8), this effectively gives the CD drive the bus for as long as it wants it. Since the CD drive only handles a single task at a time, one would think that there would be plenty of time that the CD was idle and not wanting the bus. Unforunately, it seems that the SCSI system/ aic7xxx driver is fast enough to process a command completion for the CD drive, setup a new command to send, and participate in the next arbitration phase. As the controller has the highest priority ID on the bus, this again "starves" the drives and opens the possibility for the CD drive to start requesting the bus. In the end, what I believe is happening is that the drive exhausts it's "reconnect attempt" count, and decides not to attempt to contact the initiator again. In the case of an Atlas II, if the initiator selects the drive (say to send an abort or abort tag message), the drive starts making reconnection attempts again and the wedge is cleared. Other drives may not behave as nicely. So, what can be done about this? I'm currently looking through the SCSI II and III specs to determine what the standard has to say about reconnect attempt failures and how to properly deal with them. It may be that the SCSI layer/Adaptec driver can take actions that will work on most devices. For a more immediate fix, I suggest experimenting with: 1) Swapping the IDs on your devices so that hard drives have higher arbitration priority on the bus. The Adaptec BIOS will still find your disks in the proper order for you to boot even if you stick your CDROM or tape drive's IDs down before the hard disks. 2) Playing with the settings in the Disconnect-Reconnect mode Page (page #0x2). Try setting the "Disconnect Time Limit" variable to something other than 0. This is the time, in hundredths of a millisecond, the device waits after disconnecting before participating in arbitration. For many of you, I would expect solution 1 to work just fine. For those of you with lots of disks on a single chain (even if you don't have a tape or cdrom drive), you will probably have to try solution #2. Remeber that it's not really the type of device that matters, but the possibility of starvation. If you have lots of concurrent I/O going on to multiple disks on a single chain, you can still experience this problem (Hi Satoshi!). More information when it becomes available. -- Justin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199711160511.WAA24691>