From owner-freebsd-stable  Tue Dec 14  9:56: 0 1999
Delivered-To: freebsd-stable@freebsd.org
Received: from mass.cdrom.com (castles505.castles.com [208.214.165.69])
	by hub.freebsd.org (Postfix) with ESMTP id 6125814A2F
	for <freebsd-stable@freebsd.org>; Tue, 14 Dec 1999 09:55:57 -0800 (PST)
	(envelope-from msmith@mass.cdrom.com)
Received: from mass.cdrom.com (localhost [127.0.0.1])
	by mass.cdrom.com (8.9.3/8.9.3) with ESMTP id JAA03135;
	Tue, 14 Dec 1999 09:57:11 -0800 (PST)
	(envelope-from msmith@mass.cdrom.com)
Message-Id: <199912141757.JAA03135@mass.cdrom.com>
X-Mailer: exmh version 2.1.1 10/15/1999
To: dhesi@rahul.net (Rahul Dhesi)
Cc: freebsd-stable@freebsd.org
Subject: Re: how to rewrite the data field replaceable unit? 
In-reply-to: Your message of "Tue, 14 Dec 1999 04:00:34 PST."
             <19991214120034.10BC021C@waltz.rahul.net> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 14 Dec 1999 09:57:11 -0800
From: Mike Smith <msmith@freebsd.org>
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> Mike Smith <msmith@freebsd.org> writes:
> 
> >>    (da2:ahc0:0:2:0) Unrecovered read error - recommend rewrite the data field replaceable unit: 20 sks:80,a0
> ...
> 
> >That's two things, 'data' and 'field replaceable unit'.  The FRU code 
> >tells you which part of the drive is failing - in this case the vendor 
> >code for the broken bit is '20'.
> 
> Thanks.
> 
> A suggestion to the Device Driver Gods:  Please make error messages
> understandable to the rest of us.

That's kinda difficult, actually.  Error messages need to be accurate and 
understandable to people that understand what's really going on.  It's 
not the job of an error message to explain how the hardware works.

> >In order to have the block forwarded (replaced), you need to put the drive
> >in a situation where it can guarantee that no data loss will occur.  The 
> >easiest way to do this is to work out which file contains the bad block, 
> >and copy it somewhere else, then delete the original....
> 
> This confuses me.  Even if I have deleted the file that contains the bad
> block, how is the disk hardware to know that?

It doesn't.  But now the block is free and won't be referenced, and next 
time it's referenced it's going to be written to, which will give the 
drive a chance to forward it.

> >You can also unmount the drive and write to the block manually (write a 
> >small program that seeks to the bad block and then writes to it).
> 
> This seems promising.  If I succeed in writing to that bad block,
> and later cause a read from it, and a recoverable error occurs during
> the read, the disk will likely remap the block.

Either that, or if the block is so badly damaged that the write fails it 
will be forwarded immediately.

> One more question:
> 
>    fsck reports some unreadable block numbers.  What is the block size
>    assumed by fsck when it reports that block n is bad?  Is it the same
>    as the block size of the filesystem (in this case 8192 bytes)?
>    And if so, then to rewrite the block, would I seek to
>    (block no. * 8192) and then write 8192 bytes?

No; to the best of my knowledge blocks are always reported in the device 
block size (typically 512 bytes).

> And yet another question:
> 
> When 'badsect' reports that a bad block cannot be attached because it is
> in a a non-data area, what does this mean?

It means that the block is being used for metadata (ie. a directory, 
cylinder group data, etc).  Recovering from this sort of bad block is 
very difficult.

-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  msmith@freebsd.org
\\ and he'll hate you for a lifetime.             \\  msmith@cdrom.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message