From owner-freebsd-stable  Tue May 26 18:52:38 1998
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id SAA12321
          for freebsd-stable-outgoing; Tue, 26 May 1998 18:52:38 -0700 (PDT)
          (envelope-from owner-freebsd-stable@FreeBSD.ORG)
Received: from dingo.cdrom.com (dingo.cdrom.com [204.216.28.145])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id SAA12212
          for <freebsd-stable@FreeBSD.ORG>; Tue, 26 May 1998 18:52:22 -0700 (PDT)
          (envelope-from mike@dingo.cdrom.com)
Received: from dingo.cdrom.com (localhost [127.0.0.1])
	by dingo.cdrom.com (8.8.8/8.8.5) with ESMTP id RAA02455;
	Tue, 26 May 1998 17:46:05 -0700 (PDT)
Message-Id: <199805270046.RAA02455@dingo.cdrom.com>
X-Mailer: exmh version 2.0zeta 7/24/97
To: Michael Robinson <robinson@public.bta.net.cn>
cc: mike@smith.net.au, nate@mt.sri.com, freebsd-stable@FreeBSD.ORG
Subject: Re: Bug in wd driver 
In-reply-to: Your message of "Wed, 27 May 1998 09:26:18 +0800."
             <199805270126.JAA20637@public.bta.net.cn> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 26 May 1998 17:46:05 -0700
From: Mike Smith <mike@smith.net.au>
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk

> Mike Smith <mike@smith.net.au> writes:
> >I'm sorry Nate, but if it was a bad spot the error register would be
> >nonzero.  Please check the originally quoted diagnostic for the actual
> >status/error register values.  Also note that DRQ would not be set if
> >the timeout had occurred too soon.
> 
> This is from the originally quoted diagnostic:
> 
>     wd0: interrupt timeout
>     wd0: status 50<rdy,seekdone> error 1<no_dam>
> 
> I am not an expert, but it looks to me like a nonzero error code.

ERR is not set in the status register.  As far as I can tell, ATA4 (T13/
1153D revision 16, which is the only reference I have to hand) clause 
7.15.6.6 says "When the ERR bit is cleared to zero at the end of a 
command: a) the content of the error register shall be ignored by the 
host.".

There are other conditions that could cause ERR to be cleared, and 
there are other anomalies.  The error report I was referring to when I 
was studying the reference earlier gave the status as 0x58 and error as 
zero.  This indicates data ready to be transferred and no error.  0x50 
indicates no data, ready for a command, and no error.


> >Actually, it eventually gives up.  (Check the source if you don't
> >believe me.)
> 
> I looked at the source, and it gives up after five errors.  Unfortunately,
> the driver only gets to the second retry before it wedges itself.

I was actually referring to Nate's problem here.  He threw in a 
relatively unrelated situation where he had a normal error in a 
critical disk region.

> As for unwedging itself, this seems to be pretty suspicious:

Yes.  If the disk fails to interrupt, and continues to fail to 
interrupt, the caller will remain wedged.  This is unarguably a defect, 
and if you believe you have reason to want to rework this, please talk 
with Soren (sos@freebsd.org) so that you can coordinate your efforts.

-- 
\\  Sometimes you're ahead,       \\  Mike Smith
\\  sometimes you're behind.      \\  mike@smith.net.au
\\  The race is long, and in the  \\  msmith@freebsd.org
\\  end it's only with yourself.  \\  msmith@cdrom.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message