From owner-cvs-all Fri Jul 6 0: 9:12 2001 Delivered-To: cvs-all@freebsd.org Received: from peter3.wemm.org (c1315225-a.plstn1.sfba.home.com [65.0.135.147]) by hub.freebsd.org (Postfix) with ESMTP id 1672837B403; Fri, 6 Jul 2001 00:09:06 -0700 (PDT) (envelope-from peter@wemm.org) Received: from overcee.netplex.com.au (overcee.wemm.org [10.0.0.3]) by peter3.wemm.org (8.11.0/8.11.0) with ESMTP id f66795M66609; Fri, 6 Jul 2001 00:09:05 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id 7BC4D3809; Fri, 6 Jul 2001 00:09:05 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4 To: Poul-Henning Kamp , cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Re: cvs commit: src/etc diskcheckd.conf In-Reply-To: <20010706152526.C506@gsmx07.alcatel.com.au> Date: Fri, 06 Jul 2001 00:09:05 -0700 From: Peter Wemm Message-Id: <20010706070905.7BC4D3809@overcee.netplex.com.au> Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Peter Jeremy wrote: > On 2001-Jul-06 06:30:36 +0200, Poul-Henning Kamp wro te: > >diskcheckd is not designed or intended to do scrubbing, its goal has > >been reached if it does detection. > > The only problem is that soft errors won't be detected until they > become hard errors - which is too late for the data on the disk. > diskcheckd seems to be a slow (non I/O intensive) way to run > "dd if=/dev/foo of=/dev/null bs=64k". It would be useful if the > disk drivers could report soft errors so that something like > diskcheckd could detect that a disk was going bad whilst it was > still readable. The problem is that modern disks dont "go bad" as such. They tend to get one of two problems: 1: KABOOM!! (or some other mechanical failure) 2: transient problem writing leads to 'HARD READ' errors. IBM DTLA drives do this with disturbing regularity. The inner tracks get scrambled while being written and the sector in the middle of it is unrecoverable. No amount of retries can make up for the wrong data being written. This is not a 'bad disk', just normal behavior. You need to admit defeat, rewrite the sector and get on with life. (and you dont swap one IBM DTLA for another, the problems still happen. The only swap that makes sense is to a different model (DPTA is fine) or brand). IMHO, diskcheckd doesn't help with either of these cases. In relatively rare cases drives start going marginal and retries do sometimes work if you retry enough. Overheating is a good cause of this. These are the vast minority of cases if our 10,000-20,000 drive sample is any indication. A scrubber is more useful here, but that is a really black art. diskcheckd wont detect marginal sectors while they are recoverable - not until it is virtually too late. Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message