Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 07 Jul 1997 23:29:34 -0700 (PDT)
From:      Simon Shapiro <Shimon@i-Connect.Net>
To:        Josh Tiefenbach <josh@ican.net>
Cc:        scsi@freebsd.org
Subject:   RE: More on the DPT hangs/errors
Message-ID:  <XFMail.970707232934.Shimon@i-Connect.Net>
In-Reply-To: <19970707230647.52460@ican.net>

next in thread | previous in thread | raw e-mail | index | archive | help

Hi Josh Tiefenbach;  On 08-Jul-97 you wrote: 
> 
> Excerpts from console log:
> 
> dpt0: BAD (0) CCB in SP (status = 1110 0000 ).
> dpt0: Marking 4048 (Read (10) [6.1.5]) on c0b0t1l0 as late after
> 11763232usec
> dpt0: Destroying stale 4048 (Read (10) [6.1.5]) on c0b0t1l0 (21763232)
> dpt0: BAD (0) CCB in SP (status = 0000 0000 ).
> dpt0: Marking 10097 (Write (10) [6.1.18]) on c0b0t2l0 as late after
> 17962817usec
> dpt0: Destroying stale 10097 (Write (10) [6.1.18]) on c0b0t2l0 (27962814)

This is exactly how i wanted the timeouts to behave;  Wait as long as sd.c
wants, multiplied by ``business factor''.  If still there after twice as
long, destroy it and tell sd.c ``sorry''.  If the command somehow 
completes before destruction, it will be salvaged.  If it arrives after
destruction, the log will tell you that too.

In your case the command actually completed (with bad status), so it will
never complete again.

> Note: first occurance during massive writes to non-RAIDED disks, second
> occurance during a newfs of the RAIDed disks.

Make SURE you have ``options DPT_SINTR_SPLHIGH'' in your kernel.
Justin has suggested a better (read correct:-) way of doing it.  As soon
as his patch arrives here, I will integrate it and get rid of this flag.

> In both occurances, things `hung' at the time corresponding to the `BAD
> CCB',
> and `unhung' at the time corresponding to the `Destroying stale...'
> message.

Not the whole system, just the program going to disk, I presume (this is
what I see here).  This is normal;

Your program issues read or write syscalls.  These eventually trnaslate
into calls to sd.c.  In case of raw device (newfs), the syscall actually 
will wait for the I/O to complete.  Since the DPT has completed, but the
driver could not make sense of it, it ``never'' completes.  The timeout
mehanism will get tired of this request and abort it.  your application 
will get I/O error and all is (almost) well.  This is a crude way of
describing things but you get the point.

Simon

P.S.

As you may have gathered, there are some problems with DPT controllers on
certain motherboards.  This is being worked on.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.970707232934.Shimon>