From owner-freebsd-hackers@freebsd.org  Thu Jul  5 01:06:01 2018
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 30EC41031009
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  5 Jul 2018 01:06:01 +0000 (UTC)
 (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net)
Received: from pdx.rh.CN85.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 958B27D09A
 for <freebsd-hackers@freebsd.org>; Thu,  5 Jul 2018 01:06:00 +0000 (UTC)
 (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net)
Received: from pdx.rh.CN85.dnsmgr.net (localhost [127.0.0.1])
 by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3) with ESMTP id w6515u6K045568;
 Wed, 4 Jul 2018 18:05:56 -0700 (PDT)
 (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net)
Received: (from freebsd-rwg@localhost)
 by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3/Submit) id w6515uRD045567;
 Wed, 4 Jul 2018 18:05:56 -0700 (PDT) (envelope-from freebsd-rwg)
From: "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>
Message-Id: <201807050105.w6515uRD045567@pdx.rh.CN85.dnsmgr.net>
Subject: Re: Confusing smartd messages
In-Reply-To: <5B3D6975.2060508@grosbein.net>
To: Eugene Grosbein <eugen@grosbein.net>
Date: Wed, 4 Jul 2018 18:05:56 -0700 (PDT)
CC: George Mitchell <george+freebsd@m5p.com>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-Mailer: ELM [version 2.4ME+ PL121h (25)]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jul 2018 01:06:01 -0000

[ Charset UTF-8 unsupported, converting... ]
> 05.07.2018 7:03, George Mitchell ?????:
> > Every thirty minutes, smartd is telling me:
> > 
> > Device: /dev/ada1, 2 Currently unreadable (pending) sectors
> > Device: /dev/ada1, 2 Offline uncorrectable sectors
> > 
> > smartctl -a /dev/ada1 seems to be reassuring me that everything is
> > fine (SMART overall-health self-assessment test result: PASSED),
> 
> If that would say FAILED, you should be replacing the disk immediately.
> PASSED does not mean it has no problems, but problems are not fatal (yet).
> 
> > though it also says:
> > 
> > 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always
> >       -       2
> > 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
> > Offline      -       2
> > 
> > which sounds like it confirms the log message above.  The disk is
> > part of a zraid pool whose "zpool status" also says everything is
> > okay.  What's the recommended action at this point?     -- George
> 
> You need to force the disk performing rewrite of those two bad sectors.
> There is a possibility they are just an example of "soft bad" and in that event
> the problem will just disappear without new remaps, that would be best possble case.
> 
> Or two sectors could happen really bad and remap will "fix" (really hide) the problem,
> in that case you should be ready for possible increasing number of bad sectors
> and have a replacement handy.
> 
> First step is running zpool scrub or even replace the disk and run "dd if=/dev/zero of=/dev/ada1".

It would be really nice if we had a way to tell ZFS to
do the equivelent of:
	dd if=/dev/ada1 of=/dev/ada1 bs=128k conv=noerror,sync
in some type of lowlevel scrub operation, with proper locking
which could easily repair most of these Pending Sector errors,
which are actually fairly common in the first couple 100
hours of drive operation.

It would of also been nice if the ata standard would of made a way
to get the LBA of pending sectors so that a very quick rewrite attempt
could be done to fix them.  IIRC this info is avalible, but in a vendor
specific way.

-- 
Rod Grimes                                                 rgrimes@freebsd.org