From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 20 01:14:08 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2EAD31065670
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 01:14:08 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta12.westchester.pa.mail.comcast.net
	(qmta12.westchester.pa.mail.comcast.net [76.96.59.227])
	by mx1.freebsd.org (Postfix) with ESMTP id CFDC28FC13
	for <freebsd-stable@freebsd.org>; Sat, 20 Aug 2011 01:14:07 +0000 (UTC)
Received: from omta01.westchester.pa.mail.comcast.net ([76.96.62.11])
	by qmta12.westchester.pa.mail.comcast.net with comcast
	id NRCw1h0020EZKEL5CRE8ze; Sat, 20 Aug 2011 01:14:08 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta01.westchester.pa.mail.comcast.net with comcast
	id NRE61h0191t3BNj3MRE70T; Sat, 20 Aug 2011 01:14:08 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 4D774102C1A; Fri, 19 Aug 2011 18:14:05 -0700 (PDT)
Date: Fri, 19 Aug 2011 18:14:05 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Kevin Oberman <kob6558@gmail.com>
Message-ID: <20110820011405.GA20330@icarus.home.lan>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819235719.GA64220@night.db.net>
	<CAN6yY1vitKEiry1SGUv4gCe69mvXoqFOTYZn299cFKw+G1VS4g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAN6yY1vitKEiry1SGUv4gCe69mvXoqFOTYZn299cFKw+G1VS4g@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org, Dan Langille <dan@langille.org>
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Aug 2011 01:14:08 -0000

On Fri, Aug 19, 2011 at 05:51:02PM -0700, Kevin Oberman wrote:
> On Fri, Aug 19, 2011 at 4:57 PM, Diane Bruce <db@db.net> wrote:
> > On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
> >> System in question: FreeBSD 8.2-STABLE #3: Thu Mar ?3 04:52:04 GMT 2011
> >>
> >> After a recent power failure, I'm seeing this in my logs:
> >>
> >> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors
> >>
> >
> > Personally, I'd replace that drive now.
> >
> >> Searching on that error message, I was led to believe that identifying the bad sector and
> >> running dd to read it would cause the HDD to reallocate that bad block.
> >
> > No, as otherwise mentioned (Hi Jeremy!) you need to read and write the
> > block. This could buy you a few more days or a few more weeks. Personally,
> > I would not wait. Your call.
> >
> 
> While I largely agree, it depends on several factors as to whether I'd
> replace the drive.
> 
> First, what does SMART show other then these errors?  If the reported
> statistics look generally good, and considering that you a mirror with
> one "good" copy of the blocks in question, the impact is zero unless
> the other drive fails. That is why the blocks need to be re-written so
> that they will be re-located on the drive.
> 
> Second, how critical is the data? The mirror gives good integrity, but
> you also need good backups. If the data MUST be on-line with high
> reliability, buy a replacement drive. You need to look at cost-benefit
> (or really the cost of replacement vs. cost of failure).
> 
> It's worth mentioning that all drives have bad blocks. Most are hard
> bad blocks and are re-mapped before the drive is shipped, but marginal
> bad blocks can and do slip through to customers and it is entirely
> possible that the drive is just fine for the most part and replacing
> it is really a waste of money.
>
> Only you can make the call, but if further bad blocks show up in the
> near term, I'll go along with recommending replacement.

I can expand a bit on this.

With ATA/SATA and SCSI disks, there's a factory default list of LBAs
which are bad (referred to as the "physical defect list").  Everyone by
now is familiar with this.

With SCSI disks there's "grown defects", which is a drive-managed AND
user-managed list of LBAs which are considered bad.  Whether these LBAs
were correctable (remapped) or not is tracked by SMART on SCSI.  I can
provide many examples of this if people want to see what it looks like
(we have quite a collection of Fujitsu disks at my workplace.  They're
one of a few vendors I more or less boycott).

With SCSI, you can clear the grown defect list with ease.  Some drives
support clearing the physical defect list too, but doing that requires a
*true* low-level format to be done afterward.  In the case you issue a
SCSI FORMAT command, any grown defects (as the drive encounters them)
will be "merged" with the physical defect list.  When the FORMAT is
done, the drive will report 0 grown defects.  Again, I can confirm this
exact behaviour with our Fujitsu disks at my workplace; it's easy to get
a list of the physical and grown defects with SCSI.

With ATA/SATA disks it's a different story:

It seems vary from vendor to vendor and model to model.  The established
theory is that the drive has a list of spare LBAs for remappings, which
is managed entirely by the drive itself -- and not reported back to the
user via SMART or any other means.  This happens entirely without user
intervention, and (on repetitive errors) might show up as the drive
stalling on some I/O or other oddities.  These situations are not
reported back to the OS either -- it's entirely 100% transparent to the
user.

When an ATA/SATA disk begins reporting errors back via SMART, or to the
OS (e.g. I/O error), on certain LBA accesses, then the theory is that
the spare LBA list used by the drive internally has been exhausted, and
it will begin using a different spare list (or an extension of the
existing spares; I'm not sure).

What Diane's getting at (Hi Diane!) is that since the drive is already
to the stage/point of reporting errors back to the OS and SMART, it
means the drive has experienced problems (which it worked around) prior
to this point in time.  Hence her recommendation to replace the drive.

What I still have a bit of trouble stomaching these days is whether or
not the above theories are still used *today* in practise on SATA disks.
Part of me is inclined to believe that **any** errors are reported to
SMART and the OS, and the remapping is reported via SMART, etc.; e.g.
there's no more "transparent" anything.  The problem is that I don't
have a good way to confirm/deny this.

Oh what I'd give for good engineering contacts within Western Digital
and Seagate...

These days, I replace drives depending upon their age (Power_On_Hours)
combined with how many errors are seen and what kind of errors.  For
example, if I have a drive that's been in operation for 20,000 hours and
it now has 2 bad LBAs, I can accept that.  If I have a drive that's been
in operation for 48 hours and it has 30 errors, that drive is getting
RMA'd.

When I get new or RMA'd/refurbished drives, I test them before putting
them to use.  I do a read-only surface scan using SMART ("smartctl -t
select,0-max /dev/XXX") and let that finish.  Assuming no errors are
shown in the selective scan log, I then proceed with a full disk zero
("dd if=/dev/zero of=/dev/XXX bs=64k").  When finished I check SMART for
any errors.  If there are any, I RMA the drive -- or if it's been RMA'd
already, I get angry at the vendor.  :-)

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |