Date: Tue, 26 May 1998 05:43:42 -0700 From: Mike Smith <mike@smith.net.au> To: Michael Robinson <robinson@public.bta.net.cn> Cc: mike@smith.net.au, freebsd-stable@FreeBSD.ORG Subject: Re: Bug in wd driver Message-ID: <199805261243.FAA00386@antipodes.cdrom.com> In-Reply-To: Your message of "Tue, 26 May 1998 18:23:58 %2B0800." <199805261023.SAA11951@public.bta.net.cn>
next in thread | previous in thread | raw e-mail | index | archive | help
> Mike Smith writes: > >> I wrote a message related to this problem to freebsd-questions > >> yesterday, but upon further investigation, I have decided this is > >> a bug, not a feature. > > > >Actually, it's almost certainly a hardware fault. > > Actually, the bug is that the driver does not recover gracefully from a > recoverable hardware fault. It instead goes into an infinite loop, taking > significant pieces of the kernel with it. Actually, an interrupt timeout is not a "recoverable hardware fault". This is a basic failure in the driver:controller protocol on the part of the drive. > >> 1. Any I/O access to the affected sectors will cause the following > >> message: > >> > >> wd0: interrupt timeout > >> wd0: status 58<rdy,seekdone,drq> error 0 > > > >The disk has failed to respond to the access request. You may be able > >to recover by dd'ing zeroes over the whole partition (forcing a block > >reallocation), however the disk may be damaged beyond repair. > > I repeat, any attempted access to the affected sectors locks up that > process. Unless dd has the ability to circumvent the wd driver, I don't > see how I would be able to dd zeroes over the whole partition. With the level of detail you provided, it was not possible to determine whether "any access" referred to read or write operations. If the disk can recover the sector(s) involved, and is not required to read from them first, a dd operation will put it in a position to do so. The fault is fairly likely related to scribble which occurred when you powered down during a write operation. You may have damaged non-recoverable metadata in the process, and the drive may not handle this case well. It is also possible that the drive is taking an inordinate amount of time before returning an error, and the interrupt timeout is preempting this return (that isn't actually likely, given the drive status above). The fact that an unrecoverable disk error locks other parts of the kernel is understandable, if not desirable. There isn't a lot that can trivially be done about this though. > What I will probably end up having to do is repartition around that track. > However, this seems like an unecessarily crude solution to me, considering > how minor the damage is. Disk metadata damage that causes the drive firmware to fail doesn't strike me as "minor" in any common usage of the term. -- \\ Sometimes you're ahead, \\ Mike Smith \\ sometimes you're behind. \\ mike@smith.net.au \\ The race is long, and in the \\ msmith@freebsd.org \\ end it's only with yourself. \\ msmith@cdrom.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199805261243.FAA00386>