Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 01 Jun 1998 12:15:17 -0400 (EDT)
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        Mark Gregory Salyzyn <mark@bohica.net>
Cc:        freebsd-scsi@FreeBSD.ORG, tcobb@staff.circle.net
Subject:   Re: DPT Redux
Message-ID:  <XFMail.980601121517.shimon@simon-shapiro.org>
In-Reply-To: <9805311428.AA11443@deathstar.deathstar.dpt.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 31-May-98 Mark Gregory Salyzyn wrote:

...  Some excellent suggestions deleted ...

> 2) Simon, you may want to consider what happens when the controller is  
> indicating busy, do you
>    perform a timeout on the busy bit of the auxilliary status register,  
> and if you do, what do
>    you show to the OS (failed command? spawn a separate command issue  
> thread to try again later?
>    spin forever waiting for ready?).

Depends on wht the O/S is doing:

a.  If we are booting, then I poll the commands one at a time, sequentially.
b.  If we are shutting down, I immediately fail the command.
c.  In all other cases, I put the command in a queue, tell the O/S the
    command has been queued and check the busy bit.  If the busy bit is
    set, I (essentially) sleep on the (driver's private) witing commands
    queue until a command completes.  When a command completes, I scan the
    wait queue and try to submit a command.  Before submitting, I check the
    busy bit.  If the busy bit is set, I (essentially) sleep until another
    command completes.

There is no timeout on any command in the wait queue.  The reason is that
there is no reasonable value to use.
Commands that have been submitted to the DPT can be timed out.  To do that,
you need the options DPT_HANDLE_TIMEOUTS and possible DPT_TIMEOUT_FACTOR
defined in the kernel configuration.  In this scenario, we check the age of
each transaction already committed to the DPT. If it timed out, we
abort it and return to the OS a failure status.
Please note that this timeout mechanism only works on commands in the
submitted queue.

>    The BSDi BSD/OS driver, for example, simply `locks' waiting for the  
> controller to get out of
>    busy, which is the simplest approach to deal with, what should be a  
> transitory situation. Also,
>    you may wish to limit the number of outstanding commands to the  
> controller (the UNIXWARE driver
>    uses the lock on wait, and 32 CCB limit to reduce the chances of this 
> problem affecting
>    performance). The highest performance DPT driver in a Networking  
> operating system (NETWARE)
>    does the `spawn an issuing task' approach to allow processing of  
> network card interrupts while
>    waiting for the controller to come free. This may be your best
> approach  
> considering you will
>    no doubt be issuing `next' commands to the controller while in the  
> context of the controller
>    interrupt service routine.

This is essentially what is being done.  As long as the DPT driver can
malloc memory for requests, it takes them from the OS.  If commands cannot
be submitted to the DPT hardware, they simply wait.  The assumption is that
if the DPT is too busy, it is too busy.  The other assumption is that every
time a command completes, the DPT hardware is a bit less busy.  We then try
to submit a request.  We do check if the controller is busy before
submitting a command.  Now, if between looking at the external busy bit and
completing the transfer of a command, the DPT becomes too busy, this will
result in a corrupted transfer.  AFAIK, from discussing it with you, once
the DPT marked not-busy and transfer of a command started, the DPT will not
clobber the partial command.
> 
>    My assumption is that you timeout and send a fail up to the OS, which 
> may explain the 0MB
>    read capacity result shown in the log?

(Conditionally) Correct assumption.  The report capacity bit is done during
boot.  At this point the DPT driver is in polled mode.  In polled mode, if
the number of commands submitted but not completed is less than the
hardware queue depth, we assume that the hardware should not be busy and
try to send a command.  If this failed (due to the aux-bit busy bit being
set), we send failure to the O/S. If, in polled mode, we have submitted as
many commands to the hardware as the hardware queue length is reported,
and none completed (stalled HBA) we immediately fail.

If we submitted a command successfully, we then wait 50us * the amount of
wait the OS indicated (in xs->timeout).  Once a command completetion is
indicated (or we timed-out), we process the command completion, and return
to the OS, using normal processing.

>From this description, it is clear that if a command takes inordinately
long during boot, it may timeout and fail.  Please remember that this is
ONLY TRUE DURING O/S BOOT.  I am reluctant to change the timeout logic as
there is no telling what correct timeout is.  Too short and we will timeout
good but busy builds.  Too long and the system will stall on problems.

I am adding a bit of more error reporting to the boot section.  A patch
against 3.0-current and against 2.2-releng will be submitted today, and be
available in my ftp server in about an hour or two.

The extent of testing on these patches will be to verify that they compile
and that a normal system, equipped with these patches boots and operates
normally.  Later on I will perfrom complete regression testing on the code.

> I hope this helps -- Sincerely -- Mark Salyzyn

Thanx Mark.

Simon


---


Sincerely Yours, 

Simon Shapiro                                           Shimon@Simon-Shapiro.ORG
                                                        770.265.7340

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980601121517.shimon>