From owner-freebsd-scsi  Wed Mar 26 10:13:54 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id KAA03970
          for freebsd-scsi-outgoing; Wed, 26 Mar 1997 10:13:54 -0800 (PST)
Received: from pluto.plutotech.com (root@pluto.plutotech.com [206.168.67.1])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id KAA03955
          for <freebsd-scsi@FreeBSD.ORG>; Wed, 26 Mar 1997 10:13:49 -0800 (PST)
Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130]) by pluto.plutotech.com (8.8.5/8.8.3) with ESMTP id LAA27397; Wed, 26 Mar 1997 11:13:39 -0700 (MST)
Message-Id: <199703261813.LAA27397@pluto.plutotech.com>
X-Mailer: exmh version 2.0beta 12/23/96
To: "Roy M. Hooper" <rhooper@toybox.ottawa.on.ca>
cc: freebsd-scsi@FreeBSD.ORG
Subject: Re: AHA2940 bug(s) still exist in 2.2.1 
In-reply-to: Your message of "Wed, 26 Mar 1997 11:17:08 EST."
             <199703261617.LAA17954@toybox.ottawa.on.ca> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 26 Mar 1997 11:14:00 -0700
From: "Justin T. Gibbs" <gibbs@plutotech.com>
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

>
>It would appear that the bugs in the 2940 drivers are still there in 2.2.1.
>We had the same kind of crash as usual, except the machine didn't come back
>up this time.
>
>We received several "timeout" messages and then the machine froze.

I need the exact timeout messages, information about what AHC options you
have in your kernel config file, and the dmesg output listing the drives 
you are using.

I will say this though.  Using tagged queueing can still be somewhat 
dangerous with the Quantum Atlas II drives.  For one thing, even with only
8 tags outstanding, they can return QUEUE FULL status which the generic 
FreeBSD code simply does not handle very well.  The transaction will be 
repeatedly requeued by the kernel until it succeeds with no amount of 
delay between retries which can often cause the drive to simply "give up"
and return BUSY status indefinitely.  The proper fix for this is in the 
works, but it comes only once we convert to my new CAM SCSI framework 
probably a month or two down the line.

Even if the drive doesn't return QUEUE FULL, it is very possible that you 
are experiencing "tag starvation".  The driver currently used "Simple 
Queue" tags for all transactions which allows the drive to reorder 
the transactions in anyway it sees fit so long as "write followed by read"
consistency is maintained.  This means that a transaction for a location 
far from the current head possition can be starved by a continuous stream 
of transactions that don't require large seeks.  The faster and larger the 
drive, the easier it is to make this happen.  I saw it on a Quantum Atlas 
II last night during two concurrent copies over 100Bt ethernet.  The 
driver does attempt to handle this condition by first attempting to queue 
an Ordered Tagged transaction to the disk.  This should force the drive to 
finish all pending transactions before starting any others.  The current 
timeout for the Ordered Tagged transaction to be successful is only 1 
second which perhaps isn't really long enough.  Something that my help 
this problem is to perform ordered writes for all synch write operations
which will be possible with the new SCSI code.

--
Justin T. Gibbs
===========================================
  FreeBSD: Turning PCs into workstations
===========================================