From owner-freebsd-questions  Wed Jun 10 12:39:22 1998
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id MAA23598
          for freebsd-questions-outgoing; Wed, 10 Jun 1998 12:39:22 -0700 (PDT)
          (envelope-from owner-freebsd-questions@FreeBSD.ORG)
Received: from pop.asahi-net.or.jp (pop.asahi-net.or.jp [202.224.39.6])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA23498;
          Wed, 10 Jun 1998 12:38:52 -0700 (PDT)
          (envelope-from tfuruya@dilemma.tf.or.jp@ppp129162.asahi-net.or.jp)
Received: from galois.tf.or.jp (ppp129162.asahi-net.or.jp [202.213.129.162])
	by pop.asahi-net.or.jp (8.8.8/3.6W) with ESMTP id EAA37326;
	Thu, 11 Jun 1998 04:43:49 +0900
Received: from dilemma.tf.or.jp (dilemma.tf.or.jp [192.168.1.3])
	by galois.tf.or.jp (8.8.8/3.6W-ht5t-fry@asahi-net-98042218) with ESMTP id EAA09238;
	Thu, 11 Jun 1998 04:38:15 +0900 (JST)
Received: from dilemma.tf.or.jp (localhost [127.0.0.1])
	by dilemma.tf.or.jp (8.8.8/3.6W-CF3.6W-dilemma-tf.or.jp-9806) with ESMTP id EAA11696;
	Thu, 11 Jun 1998 04:41:11 +0900 (JST)
Message-Id: <199806101941.EAA11696@dilemma.tf.or.jp>
To: mike@smith.net.au
Cc: robinson@public.bta.net.cn, freebsd-stable@FreeBSD.ORG,
        freebsd-questions@FreeBSD.ORG, Tetsuro FURUYA <tfu@ff.iij4u.or.jp>
Subject: Re: Bug in wd driver 
From: Tetsuro FURUYA <ht5t-fry@asahi-net.or.jp>
Reply-To: Tetsuro FURUYA <ht5t-fry@asahi-net.or.jp>
In-Reply-To: Your message of "Thu, 28 May 1998 12:57:14 -0700"
References: <199805281957.MAA01309@dingo.cdrom.com>
X-Mailer: Mew version 1.54 on Emacs 19.28.1, Mule 2.3
X-fingerprint: F1 BA 5F C1 C2 48 1D C7  AE 5F 16 ED 12 17 75 38
X-URL: http://sodan.komaba.ecc.u-tokyo.ac.jp/~tfuruya/
Mime-Version: 1.0
Content-Type: Multipart/Signed;
	protocol="application/pgp-signature";
	micalg="pgp-md5";
	boundary="--Security_Multipart(Thu_Jun_11_04:40:53_1998)--"
Date: Thu, 11 Jun 1998 04:41:08 +0900
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

----Security_Multipart(Thu_Jun_11_04:40:53_1998)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


I wrote, 

In Message-Id: <199805272026.FAA16850@dilemma.tf.or.jp>
In Message-Id: <199805281508.AAA04056@dilemma.tf.or.jp>

>I have been encountered at the same defaults in using Panasonic AL-N1,
>and FreeBSD-2.2.2.

>And bad144 was hangupped.
>But I have found out how to manipulate bad144, or fsck , or badsect.

>My kernel has kernel-debugger ddb(4) installed in it.
>                              ^^^^^^
>So, listening to the hamming sound of wd0 drive, and when wd drive
>is hangupped, invoke kernel-debugger by typing ctrl-alt-ESC keys.
>                                               ^^^^^^^^^^^^
>A while after stopping of disk access, type 'c' or 'continue',
>and go back to bad144 or fsck.                      ^^^^^^^^
>Several attempts may complete the identification of bad clusters.
>As for my machine, this was worked.

And you pointed out that,
> > In Message-ID: <199805272101.OAA01902@dingo.cdrom.com>
> > Mike Smith <mike@smith.net.au> worte:

> > >fsck /usr
> > >.....
> > >wd0: interrupt timeout:
> > >wd0: status 50<rdy,seekdone> error 0
> > >wd0: interrupt timeout:
> > >wd0: status 50<rdy,seekdone> error 1<no_dam>
> > 
> > >===> hang up
> > >===> type 'cntrl-alt-esc'
> 
> This defers the interrupt timeout...
> 
> > >db>wd0s1f: hard error reading fsbn 1152850 of 1152850-1152851(wd0s1 bn
> > >1279826; cn 317 tn 26 sn 44)
> > >wd0: status 59<rdy,seekdone,drq,err> error 40<uncorr>
> 
> ... but not the interrupt, which finally arrives and contains real 
> error information.  Note that the interrupt timeouts in your case 
> *don't* have DRQ set.  Are you running in multi-block mode?
> 
> > As for wd.c source, I will try to experiment :)
> 
> Please do.  It looks like your information may lead to a result here.  

It seems too late for writing reply to mailing list.
But, this seems important to note-users, so I dare to report the result of
my experiment of patch to /usr/src/sys/i386/isa/wd.c
which Mr. Mike Smith's stated,

In Message-Id: <199805272101.OAA01902@dingo.cdrom.com>
Mike Smith <mike@smith.net.au> writes:

>This would tend to imply that the timeout value is too short.
>
>Can you try increasing the timeout counter and provoking your disk?
>
>In sys/i386/isa/wd.c, in this section:
>
>        /*
>         * Schedule wdtimeout() to wake up after a few seconds.  Retrying
>         * unmarked bad blocks can take 3 seconds!  Then it is not good that
>         * we retry 5 times.
>         *
>         * On the first try, we give it 10 seconds, for drives that may need
>         * to spin up.
>         *
>         * XXX wdtimeout() doesn't increment the error count so we may loop
>         * forever.  More seriously, the loop isn't forever but causes a
>         * crash.
>         *
>         * TODO fix b_resid bug elsewhere (fd.c....).  Fix short but positive
>         * counts being discarded after there is an error (in physio I
>         * think).  Discarding them would be OK if the (special) file offset
>         * was not advanced.
>         */
>        if (wdtab[ctrlr].b_errcnt == 0)
>                du->dk_timeout = 1 + 10;
>        else
>                du->dk_timeout = 1 + 3;   <---- Only this line.
>
>
>Increase the 10 and 3 values (first and subsequent timeouts).  Try 
>raising them lots, then come down slowly.

Unfortunately, my /usr/src/sys/i386/isa/wd.c is different
from the above source code.
There is just only the last line in the wd.c.

So, I rewrite only this last line, and increased 3 to 50. ( Is this OK?)
Up to now, I have not yet experienced any disk crash, nor cannot-mount-root
problem, nor anything bad else.
And, system comes back successfully from bad sector read.
This time, error message is only as follows,

>wd0s1f: hard error reading fsbn 1152850 of 1152850-1152851(wd0s1 bn
>1279826; cn 317 tn 26 sn 44)
>wd0: status 59<rdy,seekdone,drq,err> error 40<uncorr>

or,

>Jun  8 12:17:03 dilemma pccardd[37]: pccardd started
>Jun  8 12:30:59 dilemma /kernel: wd0s1f: hard error reading
 fsbn 1215577 of 1215576-1215579 (wd0s1 bn 1342553; cn 332 tn 62 sn 23)
wd0: status 59<rdy,seekdone,drq,err> error 10<no_id>
>Jun  8 12:31:08 dilemma /kernel: wd0s1f: hard error reading
 fsbn 1215577 of 1215576-1215579 (wd0s1 bn 1342553; cn 332 tn 62 sn 23)
wd0: status 59<rdy,seekdone,drq,err> error 10<no_id>

So, the bug of wd.c device driver seems to be removed ^^)
The another problem of system lock after wd hungup seems to be
related to indefinite wait of swap_pager.(This is serious for X.)
But this defect does not appear when the wd device driver can recover
from disk access error.

You have written that 
>raising them lots, then come down slowly.

Is there any inconvenience when du->dk_timeout value is
very large ?
What if du->dk_timeout value is too large ?
What is this du->dk_timeout ?

I've just tried 'cd /usr; badsect BAD 1152850 1215577' & 'fsck /dev/rwd0s1f',
 but 'bad144 -s -v /dev/wd0' should work fine. 
( I had often used bad144. But now, my bad sectors of wd0 become too many
 for bad144 :( )
badsect & fsck don't take care of swap area,
 nevertheless they are working fine now :)

So, Thank you Mr. Mike Smith !

========================================================================
TEL: 048-852-3520    FAX: 048-858-1597
E-Mail:
     ht5t-fry@asahi-net.or.jp
     tfu@ff.iij4u.or.jp
pgp-fingerprint:
     pub  Tetsuro FURUYA <ht5t-fry@asahi-net.or.jp>
      Key fingerprint = F1 BA 5F C1 C2 48 1D C7  AE 5F 16 ED 12 17 75 38
=========================================================================

----Security_Multipart(Thu_Jun_11_04:40:53_1998)--
Content-Type: Application/Pgp-Signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP MESSAGE-----
Version: 2.6.3i

iQCVAwUANX7hSjzkiNBZ20qpAQGRfgP/Ws9puO32Jc4cxOZTE+TXDcYnBWhJV8vV
DeOuhMrf4Pozd+Y6LPgQ1FFXJHPwdU9ZR4vxUSn1VmBN/Hps/cA/UAFu1MG9p2oB
HfQqWrYFjE0zscm1Xja569jnICj2WVl5iPhmIDAXhvaCJrhLj1FF7ctcF8ZWeX0W
Sna/x38TJ0s=
=Zczd
-----END PGP MESSAGE-----

----Security_Multipart(Thu_Jun_11_04:40:53_1998)----

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message