From owner-freebsd-questions Thu Oct 24 14:36:47 1996 Return-Path: owner-questions Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id OAA23620 for questions-outgoing; Thu, 24 Oct 1996 14:36:47 -0700 (PDT) Received: from chai.plexuscom.com (chai.plexuscom.com [207.87.46.100]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id OAA23615 for ; Thu, 24 Oct 1996 14:36:44 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by chai.plexuscom.com (8.7.6/8.6.12) with SMTP id RAA13123; Thu, 24 Oct 1996 17:34:33 -0400 (EDT) Message-Id: <199610242134.RAA13123@chai.plexuscom.com> X-Authentication-Warning: chai.plexuscom.com: Host localhost [127.0.0.1] didn't use HELO protocol To: dg@Root.COM, dwhite@resnet.uoregon.edu, fenner@parc.xerox.com Cc: questions@freebsd.org Subject: Re: Is my disk going bad? In-reply-to: Your message of "Thu, 24 Oct 1996 13:41:39 PDT." <199610242041.NAA06512@root.com> Date: Thu, 24 Oct 1996 17:34:33 -0400 From: Bakul Shah Sender: owner-questions@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > >> I just noticed that I've been getting these for a while: > >> > >> sd1(ncr0:1:0): MEDIUM ERROR info:119a05 csi:6,a8,3,41 asc:11,43 field replaceable unit: 15 sks:80,40 > >> > >> sd1 is a Quantum 1080S. I don't have the probe messages since the > >> medium error messages have scrolled them away. > >> > >> I just yesterday turned on remapping: > >> > >> % scsi -f /dev/rsd1 -m 1 > >> AWRE (Auto Write Reallocation Enbld): 1 > >> ARRE (Auto Read Reallocation Enbld): 1 > >> > >> but it's not remapping, it's still returning errors. > >> > >> Is this the disk going so bad that it can't reallocate to good blocks? > > > >How full is it? Once you've filled the disk then it can't reallocate > >those bad sectors anywhere else. > Uhh, gurp. Drives reserve spare tracks and blocks for use in reallocation. > The space does not come from the filesystem free block pool. If the MEDIUM ERROR was a `hard read error', one that can not be corrected by the block's ECC, the disk is doing the *right thing* by not automatically remapping it. If a block with a hard read error was automatically remapped, the _next_ time this block is read, you *won't* get a read error but any data in this new block is garbage -- so now you have _silent_ data corruption. Automatic remapping only makes sense on a write to a known bad block or on a *soft read error* -- in the latter case the original data _was_ recovered thanks to the ECC and on the chance this block is going bad, the original data is moved to a good block (and the old block number is mapped to the new block). Another (remote) possibility is that the disk has run out of spares.