Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Aug 1997 00:09:20 -0500
From:      Doug Ledford <dledford@dialnet.net>
To:        "Daniel M. Eischen" <deischen@iworks.InterWorks.org>
Cc:        welbon@bga.com, aic7xxx@FreeBSD.ORG
Subject:   Re: aic7xxx / AHA2940 worries... anyone? 
Message-ID:  <199708110509.AAA11039@dledford.dialnet.net>
In-Reply-To: Your message of "Sun, 10 Aug 1997 23:30:18 EDT." <33EE86B7.41C67EA6@iworks.InterWorks.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
--------

> We really need to see error messages from the failure.  Any in
> the logs (from the latest driver, right?)?

> > CONFIG_SCSI_AIC7XXX=m
> > CONFIG_AIC7XXX_TAGGED_QUEUEING=y
> > CONFIG_AIC7XXX_CMDS_PER_LUN=16
> 
> You should try lowering this to 8.  I have had no problems on my 
> IBM DORS-32160W at 16 commands/lun, but it could be they have a
> firmware bug under really heavy load.

If I recall from the original post that Ed made to linux-kernel, I think 
there was a message in the logs about:

aic7xxx: CMDCMPLT near QFULLCNT: QOUTCNT = 16

Actually, I thought I saw several of those.  This could be the cause of the 
problem (loosing a command on the out side of things, one that finished but 
we didn't know it because it rolled out of the fifo).  This would then 
trigger abort/reset stuff, and as we all know, that code isn't the most 
reliable in the world right now (I've been waiting to work on it until there 
is a distribution out using the current code so any patches that may need 
testing can be made against the current source without causing too much 
grief for people).  In any case, the reduction in the commands per lun could 
very well solve the problem, but not neccessarily due to any bugs in the 
driver firmware, it may just be that under the kind of load 9 drives can 
create on a controller, we are losing commands and getting hosed.  By 
lowering the number of outstanding commands, you would reduce the chances of 
this happening.

In any case, Justin, is there any word on the possibility of getting the 
sequencer to check the qoutfifo's status before putting a command on there, 
and spin locking if the queue is full (any elegant ideas on how to do this 
if we can't read the QOUTCNT register inside the sequencer)?  It seems to me 
that this problem is going to be worse in scenarios like the bonnie test 
where you have large numbers of closely grouped writes where the drive could 
download a write's data, cache it, get the next write's data, cache it, 
write several of these out in a single spindle pass, then come back to the 
controller with multiple COMMAND_COMPLETES at very short intervals, 
especially with lots of drives in use.  For things like this we very well 
could need to add this item back into the sequencer.  One final thing, 
Justin, do you happen to know for certain what happens with the QUOTCNT 
register if the QOUTFIFO does overflow?  Does it stay at 16 (on the 2940 
chips anyway), or does it wrap back to 0 or what?
-- 
*****************************************************************************
* Doug Ledford                      *   Unix, Novell, Dos, Windows 3.x,     *
* dledford@dialnet.net    873-DIAL  *     WfW, Windows 95 & NT Technician   *
*   PPP access $14.95/month         *****************************************
*   Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
*   communities.  Sign-up online at * Web page creation and hosting, other  *
*   873-9000 V.34                   * services available, call for info.    *
*****************************************************************************





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708110509.AAA11039>