Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 06 Jul 2001 00:09:05 -0700
From:      Peter Wemm <peter@wemm.org>
To:        Poul-Henning Kamp <phk@critter.freebsd.dk>, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG
Subject:   Re: cvs commit: src/etc diskcheckd.conf 
Message-ID:  <20010706070905.7BC4D3809@overcee.netplex.com.au>
In-Reply-To: <20010706152526.C506@gsmx07.alcatel.com.au> 

next in thread | previous in thread | raw e-mail | index | archive | help
Peter Jeremy wrote:
> On 2001-Jul-06 06:30:36 +0200, Poul-Henning Kamp <phk@critter.freebsd.dk> wro
    te:
> >diskcheckd is not designed or intended to do scrubbing, its goal has
> >been reached if it does detection.
> 
> The only problem is that soft errors won't be detected until they
> become hard errors - which is too late for the data on the disk.
> diskcheckd seems to be a slow (non I/O intensive) way to run
> "dd if=/dev/foo of=/dev/null bs=64k".  It would be useful if the
> disk drivers could report soft errors so that something like
> diskcheckd could detect that a disk was going bad whilst it was
> still readable.

The problem is that modern disks dont "go bad" as such.  They tend to get
one of two problems:
1: KABOOM!! (or some other mechanical failure)
2: transient problem writing leads to 'HARD READ' errors. IBM DTLA drives
do this with disturbing regularity. The inner tracks get scrambled while
being written and the sector in the middle of it is unrecoverable.  No
amount of retries can make up for the wrong data being written.  This is
not a 'bad disk', just normal behavior. You need to admit defeat, rewrite
the sector and get on with life.  (and you dont swap one IBM DTLA for
another, the problems still happen.  The only swap that makes sense is to
a different model (DPTA is fine) or brand).

IMHO, diskcheckd doesn't help with either of these cases.

In relatively rare cases drives start going marginal and retries do
sometimes work if you retry enough.  Overheating is a good cause of this.
These are the vast minority of cases if our 10,000-20,000 drive sample is
any indication.  A scrubber is more useful here, but that is a really
black art.  diskcheckd wont detect marginal sectors while they are
recoverable - not until it is virtually too late.

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010706070905.7BC4D3809>