From owner-freebsd-questions Wed Jun 10 12:39:22 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA23598 for freebsd-questions-outgoing; Wed, 10 Jun 1998 12:39:22 -0700 (PDT) (envelope-from owner-freebsd-questions@FreeBSD.ORG) Received: from pop.asahi-net.or.jp (pop.asahi-net.or.jp [202.224.39.6]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA23498; Wed, 10 Jun 1998 12:38:52 -0700 (PDT) (envelope-from tfuruya@dilemma.tf.or.jp@ppp129162.asahi-net.or.jp) Received: from galois.tf.or.jp (ppp129162.asahi-net.or.jp [202.213.129.162]) by pop.asahi-net.or.jp (8.8.8/3.6W) with ESMTP id EAA37326; Thu, 11 Jun 1998 04:43:49 +0900 Received: from dilemma.tf.or.jp (dilemma.tf.or.jp [192.168.1.3]) by galois.tf.or.jp (8.8.8/3.6W-ht5t-fry@asahi-net-98042218) with ESMTP id EAA09238; Thu, 11 Jun 1998 04:38:15 +0900 (JST) Received: from dilemma.tf.or.jp (localhost [127.0.0.1]) by dilemma.tf.or.jp (8.8.8/3.6W-CF3.6W-dilemma-tf.or.jp-9806) with ESMTP id EAA11696; Thu, 11 Jun 1998 04:41:11 +0900 (JST) Message-Id: <199806101941.EAA11696@dilemma.tf.or.jp> To: mike@smith.net.au Cc: robinson@public.bta.net.cn, freebsd-stable@FreeBSD.ORG, freebsd-questions@FreeBSD.ORG, Tetsuro FURUYA Subject: Re: Bug in wd driver From: Tetsuro FURUYA Reply-To: Tetsuro FURUYA In-Reply-To: Your message of "Thu, 28 May 1998 12:57:14 -0700" References: <199805281957.MAA01309@dingo.cdrom.com> X-Mailer: Mew version 1.54 on Emacs 19.28.1, Mule 2.3 X-fingerprint: F1 BA 5F C1 C2 48 1D C7 AE 5F 16 ED 12 17 75 38 X-URL: http://sodan.komaba.ecc.u-tokyo.ac.jp/~tfuruya/ Mime-Version: 1.0 Content-Type: Multipart/Signed; protocol="application/pgp-signature"; micalg="pgp-md5"; boundary="--Security_Multipart(Thu_Jun_11_04:40:53_1998)--" Date: Thu, 11 Jun 1998 04:41:08 +0900 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG ----Security_Multipart(Thu_Jun_11_04:40:53_1998)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit I wrote, In Message-Id: <199805272026.FAA16850@dilemma.tf.or.jp> In Message-Id: <199805281508.AAA04056@dilemma.tf.or.jp> >I have been encountered at the same defaults in using Panasonic AL-N1, >and FreeBSD-2.2.2. >And bad144 was hangupped. >But I have found out how to manipulate bad144, or fsck , or badsect. >My kernel has kernel-debugger ddb(4) installed in it. > ^^^^^^ >So, listening to the hamming sound of wd0 drive, and when wd drive >is hangupped, invoke kernel-debugger by typing ctrl-alt-ESC keys. > ^^^^^^^^^^^^ >A while after stopping of disk access, type 'c' or 'continue', >and go back to bad144 or fsck. ^^^^^^^^ >Several attempts may complete the identification of bad clusters. >As for my machine, this was worked. And you pointed out that, > > In Message-ID: <199805272101.OAA01902@dingo.cdrom.com> > > Mike Smith worte: > > >fsck /usr > > >..... > > >wd0: interrupt timeout: > > >wd0: status 50 error 0 > > >wd0: interrupt timeout: > > >wd0: status 50 error 1 > > > > >===> hang up > > >===> type 'cntrl-alt-esc' > > This defers the interrupt timeout... > > > >db>wd0s1f: hard error reading fsbn 1152850 of 1152850-1152851(wd0s1 bn > > >1279826; cn 317 tn 26 sn 44) > > >wd0: status 59 error 40 > > ... but not the interrupt, which finally arrives and contains real > error information. Note that the interrupt timeouts in your case > *don't* have DRQ set. Are you running in multi-block mode? > > > As for wd.c source, I will try to experiment :) > > Please do. It looks like your information may lead to a result here. It seems too late for writing reply to mailing list. But, this seems important to note-users, so I dare to report the result of my experiment of patch to /usr/src/sys/i386/isa/wd.c which Mr. Mike Smith's stated, In Message-Id: <199805272101.OAA01902@dingo.cdrom.com> Mike Smith writes: >This would tend to imply that the timeout value is too short. > >Can you try increasing the timeout counter and provoking your disk? > >In sys/i386/isa/wd.c, in this section: > > /* > * Schedule wdtimeout() to wake up after a few seconds. Retrying > * unmarked bad blocks can take 3 seconds! Then it is not good that > * we retry 5 times. > * > * On the first try, we give it 10 seconds, for drives that may need > * to spin up. > * > * XXX wdtimeout() doesn't increment the error count so we may loop > * forever. More seriously, the loop isn't forever but causes a > * crash. > * > * TODO fix b_resid bug elsewhere (fd.c....). Fix short but positive > * counts being discarded after there is an error (in physio I > * think). Discarding them would be OK if the (special) file offset > * was not advanced. > */ > if (wdtab[ctrlr].b_errcnt == 0) > du->dk_timeout = 1 + 10; > else > du->dk_timeout = 1 + 3; <---- Only this line. > > >Increase the 10 and 3 values (first and subsequent timeouts). Try >raising them lots, then come down slowly. Unfortunately, my /usr/src/sys/i386/isa/wd.c is different from the above source code. There is just only the last line in the wd.c. So, I rewrite only this last line, and increased 3 to 50. ( Is this OK?) Up to now, I have not yet experienced any disk crash, nor cannot-mount-root problem, nor anything bad else. And, system comes back successfully from bad sector read. This time, error message is only as follows, >wd0s1f: hard error reading fsbn 1152850 of 1152850-1152851(wd0s1 bn >1279826; cn 317 tn 26 sn 44) >wd0: status 59 error 40 or, >Jun 8 12:17:03 dilemma pccardd[37]: pccardd started >Jun 8 12:30:59 dilemma /kernel: wd0s1f: hard error reading fsbn 1215577 of 1215576-1215579 (wd0s1 bn 1342553; cn 332 tn 62 sn 23) wd0: status 59 error 10 >Jun 8 12:31:08 dilemma /kernel: wd0s1f: hard error reading fsbn 1215577 of 1215576-1215579 (wd0s1 bn 1342553; cn 332 tn 62 sn 23) wd0: status 59 error 10 So, the bug of wd.c device driver seems to be removed ^^) The another problem of system lock after wd hungup seems to be related to indefinite wait of swap_pager.(This is serious for X.) But this defect does not appear when the wd device driver can recover from disk access error. You have written that >raising them lots, then come down slowly. Is there any inconvenience when du->dk_timeout value is very large ? What if du->dk_timeout value is too large ? What is this du->dk_timeout ? I've just tried 'cd /usr; badsect BAD 1152850 1215577' & 'fsck /dev/rwd0s1f', but 'bad144 -s -v /dev/wd0' should work fine. ( I had often used bad144. But now, my bad sectors of wd0 become too many for bad144 :( ) badsect & fsck don't take care of swap area, nevertheless they are working fine now :) So, Thank you Mr. Mike Smith ! ======================================================================== TEL: 048-852-3520 FAX: 048-858-1597 E-Mail: ht5t-fry@asahi-net.or.jp tfu@ff.iij4u.or.jp pgp-fingerprint: pub Tetsuro FURUYA Key fingerprint = F1 BA 5F C1 C2 48 1D C7 AE 5F 16 ED 12 17 75 38 ========================================================================= ----Security_Multipart(Thu_Jun_11_04:40:53_1998)-- Content-Type: Application/Pgp-Signature Content-Transfer-Encoding: 7bit -----BEGIN PGP MESSAGE----- Version: 2.6.3i iQCVAwUANX7hSjzkiNBZ20qpAQGRfgP/Ws9puO32Jc4cxOZTE+TXDcYnBWhJV8vV DeOuhMrf4Pozd+Y6LPgQ1FFXJHPwdU9ZR4vxUSn1VmBN/Hps/cA/UAFu1MG9p2oB HfQqWrYFjE0zscm1Xja569jnICj2WVl5iPhmIDAXhvaCJrhLj1FF7ctcF8ZWeX0W Sna/x38TJ0s= =Zczd -----END PGP MESSAGE----- ----Security_Multipart(Thu_Jun_11_04:40:53_1998)---- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message