Date: Thu, 09 Jul 1998 06:49:32 +0900 From: Tetsuro FURUYA <ht5t-fry@asahi-net.or.jp> To: smarzloff@carif-idf.org Cc: freebsd-stable@FreeBSD.ORG, Tetsuro FURUYA <tfu@ff.iij4u.or.jp> Subject: Re: Disk problem. Message-ID: <199807082149.GAA01464@galois.tf.or.jp> In-Reply-To: Your message of "Wed, 8 Jul 1998 17:30:36 %2B0200" References: <19980708173036.A14305@rafiki.intranet.carif.asso.fr>
next in thread | previous in thread | raw e-mail | index | archive | help
Stephane Marzloff <smarzloff@carif-idf.org> wrote:
> Hi..
>
> I have a problem with a 2.2.6-STABLE (6 Jul) on a Ppro 200.
>
> Sometimes, when I launch some applications (mutt, ls, vmstat..), there is no
> responses during 10 sec.
> I suspect a disk problem.
>
> The machine isn't charge, Load average is constantly : 0.00 (0.50 maximum).
> There 18Mo of Free RAM.
>
> And 5 minutes ago, I have this message on the console :
> Jul 8 17:07:46 rafiki /kernel: wd0: interrupt timeout:
> Jul 8 17:07:46 rafiki /kernel: wd0: interrupt timeout:
> Jul 8 17:07:46 rafiki /kernel: wd0: status 50<rdy,seekdone> error 0
> Jul 8 17:07:46 rafiki /kernel: wd0: status 50<rdy,seekdone> error 0
Your ide disk sector is broken.
Try
bad144 -s -v /dev/wd0
or
badsect & fsck (This is rather difficult. So, please read man).
If system hang up while disk access,
1) install kernel debugger ddb compiled into kernel.
When system hang up, type contrl-alt-esc, and get into ddb,
and wait until disk access stops for about 20-60 seconds(this depends
on system).
Then, type 'c' to continue bad144 or fsck.
2) patch /usr/src/sys/i386/isa/wd.c.
See this mail.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Message-Id: <199806102228.PAA00747@dingo.cdrom.com>
X-Mailer: exmh version 2.0zeta 7/24/97
To: Tetsuro FURUYA <ht5t-fry@asahi-net.or.jp>
cc: mike@smith.net.au,
robinson@public.bta.net.cn,
freebsd-stable@freebsd.org,
freebsd-questions@freebsd.org,
Tetsuro FURUYA <tfu@ff.iij4u.or.jp>
Subject: Re: Bug in wd driver
In-reply-to: Your message of "Thu, 11 Jun 1998 04:41:08 +0900."
<199806101941.EAA11696@dilemma.tf.or.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 10 Jun 1998 15:28:29 -0700
From: Mike Smith <mike@smith.net.au>
Sender: owner-freebsd-stable@freebsd.org
X-Loop: FreeBSD.ORG
> > > >fsck /usr
> > > >.....
> > > >wd0: interrupt timeout:
> > > >wd0: status 50<rdy,seekdone> error 0
> > > >wd0: interrupt timeout:
> > > >wd0: status 50<rdy,seekdone> error 1<no_dam>
> > >
> > > >===> hang up
> > > >===> type 'cntrl-alt-esc'
> >
> > This defers the interrupt timeout...
> >
> > > >db>wd0s1f: hard error reading fsbn 1152850 of 1152850-1152851(wd0s1 bn
> > > >1279826; cn 317 tn 26 sn 44)
> > > >wd0: status 59<rdy,seekdone,drq,err> error 40<uncorr>
> >
> > ... but not the interrupt, which finally arrives and contains real
> > error information. Note that the interrupt timeouts in your case
> > *don't* have DRQ set. Are you running in multi-block mode?
> >
> > > As for wd.c source, I will try to experiment :)
> >
> > Please do. It looks like your information may lead to a result here.
>
> It seems too late for writing reply to mailing list.
Not at all; better late than never!
> But, this seems important to note-users, so I dare to report the result of
> my experiment of patch to /usr/src/sys/i386/isa/wd.c
> which Mr. Mike Smith's stated,
...
> > if (wdtab[ctrlr].b_errcnt == 0)
> > du->dk_timeout = 1 + 10;
> > else
> > du->dk_timeout = 1 + 3; <---- Only this line.
> >
> >
> >Increase the 10 and 3 values (first and subsequent timeouts). Try
> >raising them lots, then come down slowly.
>
> Unfortunately, my /usr/src/sys/i386/isa/wd.c is different
> from the above source code.
> There is just only the last line in the wd.c.
>
> So, I rewrite only this last line, and increased 3 to 50. ( Is this OK?)
It's just a number, and you're in the best position to determine
whether it's big enough.
> Up to now, I have not yet experienced any disk crash, nor cannot-mount-root
> problem, nor anything bad else.
Excellent! And thanks for confirming this. I hope that the original
plaintiff is in a position to try this themselves - I would be more
than happy to be completely wrong about the situation. 8)
> You have written that
> >raising them lots, then come down slowly.
>
> Is there any inconvenience when du->dk_timeout value is
> very large ?
> What if du->dk_timeout value is too large ?
The only inconvenience is in the case where the disk has truly failed
to generate an interrupt, and the delay involved before reporting the
failure.
> What is this du->dk_timeout ?
It determines how long a disk is allowed to take to complete a command.
> I've just tried 'cd /usr; badsect BAD 1152850 1215577' & 'fsck /dev/rwd0s1f',
> but 'bad144 -s -v /dev/wd0' should work fine.
> ( I had often used bad144. But now, my bad sectors of wd0 become too many
> for bad144 :( )
> badsect & fsck don't take care of swap area,
> nevertheless they are working fine now :)
>
> So, Thank you Mr. Mike Smith !
No, definitely this time the thanks are for you. I'll look at
increasing this timeout significantly for both -stable and -current, if
someone doesn't beat me to it.
--
\\ Sometimes you're ahead, \\ Mike Smith
\\ sometimes you're behind. \\ mike@smith.net.au
\\ The race is long, and in the \\ msmith@freebsd.org
\\ end it's only with yourself. \\ msmith@cdrom.com
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
========================================================================
TEL: 048-852-3520 FAX: 048-858-1597
E-Mail:
ht5t-fry@asahi-net.or.jp
tfu@ff.iij4u.or.jp
pgp-fingerprint:
pub Tetsuro FURUYA <ht5t-fry@asahi-net.or.jp>
Key fingerprint = F1 BA 5F C1 C2 48 1D C7 AE 5F 16 ED 12 17 75 38
=========================================================================
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199807082149.GAA01464>
