Date: Mon, 07 Jul 1997 23:29:34 -0700 (PDT) From: Simon Shapiro <Shimon@i-Connect.Net> To: Josh Tiefenbach <josh@ican.net> Cc: scsi@freebsd.org Subject: RE: More on the DPT hangs/errors Message-ID: <XFMail.970707232934.Shimon@i-Connect.Net> In-Reply-To: <19970707230647.52460@ican.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Josh Tiefenbach; On 08-Jul-97 you wrote: > > Excerpts from console log: > > dpt0: BAD (0) CCB in SP (status = 1110 0000 ). > dpt0: Marking 4048 (Read (10) [6.1.5]) on c0b0t1l0 as late after > 11763232usec > dpt0: Destroying stale 4048 (Read (10) [6.1.5]) on c0b0t1l0 (21763232) > dpt0: BAD (0) CCB in SP (status = 0000 0000 ). > dpt0: Marking 10097 (Write (10) [6.1.18]) on c0b0t2l0 as late after > 17962817usec > dpt0: Destroying stale 10097 (Write (10) [6.1.18]) on c0b0t2l0 (27962814) This is exactly how i wanted the timeouts to behave; Wait as long as sd.c wants, multiplied by ``business factor''. If still there after twice as long, destroy it and tell sd.c ``sorry''. If the command somehow completes before destruction, it will be salvaged. If it arrives after destruction, the log will tell you that too. In your case the command actually completed (with bad status), so it will never complete again. > Note: first occurance during massive writes to non-RAIDED disks, second > occurance during a newfs of the RAIDed disks. Make SURE you have ``options DPT_SINTR_SPLHIGH'' in your kernel. Justin has suggested a better (read correct:-) way of doing it. As soon as his patch arrives here, I will integrate it and get rid of this flag. > In both occurances, things `hung' at the time corresponding to the `BAD > CCB', > and `unhung' at the time corresponding to the `Destroying stale...' > message. Not the whole system, just the program going to disk, I presume (this is what I see here). This is normal; Your program issues read or write syscalls. These eventually trnaslate into calls to sd.c. In case of raw device (newfs), the syscall actually will wait for the I/O to complete. Since the DPT has completed, but the driver could not make sense of it, it ``never'' completes. The timeout mehanism will get tired of this request and abort it. your application will get I/O error and all is (almost) well. This is a crude way of describing things but you get the point. Simon P.S. As you may have gathered, there are some problems with DPT controllers on certain motherboards. This is being worked on.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.970707232934.Shimon>