From owner-freebsd-questions  Thu Oct 24 14:36:47 1996
Return-Path: owner-questions
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id OAA23620
          for questions-outgoing; Thu, 24 Oct 1996 14:36:47 -0700 (PDT)
Received: from chai.plexuscom.com (chai.plexuscom.com [207.87.46.100])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id OAA23615
          for <questions@freebsd.org>; Thu, 24 Oct 1996 14:36:44 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by chai.plexuscom.com (8.7.6/8.6.12) with SMTP id RAA13123; Thu, 24 Oct 1996 17:34:33 -0400 (EDT)
Message-Id: <199610242134.RAA13123@chai.plexuscom.com>
X-Authentication-Warning: chai.plexuscom.com: Host localhost [127.0.0.1] didn't use HELO protocol
To: dg@Root.COM, dwhite@resnet.uoregon.edu, fenner@parc.xerox.com
Cc: questions@freebsd.org
Subject: Re: Is my disk going bad? 
In-reply-to: Your message of "Thu, 24 Oct 1996 13:41:39 PDT."
             <199610242041.NAA06512@root.com> 
Date: Thu, 24 Oct 1996 17:34:33 -0400
From: Bakul Shah <bakul@plexuscom.com>
Sender: owner-questions@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> >> I just noticed that I've been getting these for a while:
> >> 
> >>  sd1(ncr0:1:0): MEDIUM ERROR info:119a05 csi:6,a8,3,41 asc:11,43  field replaceable unit: 15 sks:80,40
> >> 
> >> sd1 is a Quantum 1080S.  I don't have the probe messages since the
> >> medium error messages have scrolled them away.
> >> 
> >> I just yesterday turned on remapping:
> >> 
> >> % scsi -f /dev/rsd1 -m 1
> >> AWRE (Auto Write Reallocation Enbld):  1 
> >> ARRE (Auto Read Reallocation Enbld):  1 
> >> 
> >> but it's not remapping, it's still returning errors.
> >> 
> >> Is this the disk going so bad that it can't reallocate to good blocks?
> >
> >How full is it?  Once you've filled the disk then it can't reallocate
> >those bad sectors anywhere else.

>    Uhh, gurp. Drives reserve spare tracks and blocks for use in reallocation.
> The space does not come from the filesystem free block pool.

If the MEDIUM ERROR was a `hard read error', one that can not be
corrected by the block's ECC, the disk is doing the *right thing* by
not automatically remapping it.  If a block with a hard read error
was automatically remapped, the _next_ time this block is read, you
*won't* get a read error but any data in this new block is garbage
-- so now you have _silent_ data corruption.

Automatic remapping only makes sense on a write to a known bad block
or on a *soft read error* -- in the latter case the original data
_was_ recovered thanks to the ECC and on the chance this block is
going bad, the original data is moved to a good block (and the old
block number is mapped to the new block).

Another (remote) possibility is that the disk has run out of spares.