From owner-freebsd-hackers@freebsd.org  Thu Jul  5 17:43:57 2018
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DADEE1041BC6
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  5 Jul 2018 17:43:57 +0000 (UTC)
 (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net)
Received: from pdx.rh.CN85.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4983E8C1AD;
 Thu,  5 Jul 2018 17:43:57 +0000 (UTC)
 (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net)
Received: from pdx.rh.CN85.dnsmgr.net (localhost [127.0.0.1])
 by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3) with ESMTP id w65HhsAY048744;
 Thu, 5 Jul 2018 10:43:54 -0700 (PDT)
 (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net)
Received: (from freebsd-rwg@localhost)
 by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3/Submit) id w65HhsYb048743;
 Thu, 5 Jul 2018 10:43:54 -0700 (PDT) (envelope-from freebsd-rwg)
From: "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>
Message-Id: <201807051743.w65HhsYb048743@pdx.rh.CN85.dnsmgr.net>
Subject: Re: Confusing smartd messages
In-Reply-To: <51eb8232-49a7-0b3a-2d0f-9882ebfbfa1d@FreeBSD.org>
To: lev@freebsd.org
Date: Thu, 5 Jul 2018 10:43:54 -0700 (PDT)
CC: George Mitchell <george+freebsd@m5p.com>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-Mailer: ELM [version 2.4ME+ PL121h (25)]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jul 2018 17:43:58 -0000

> On 05.07.2018 3:03, George Mitchell wrote:
> 
> > which sounds like it confirms the log message above.  The disk is
> > part of a zraid pool whose "zpool status" also says everything is
> > okay.  What's the recommended action at this point?     -- George
> 
>  In my experience it is begin of disk death, even if overall status is
> PASSED. It could work for month or may be half a year after first
> Offline_Uncorrectable is detected (it depends on load), but you best bet
> to replace it ASAP and throw away.

The appearance of pending or offline sector issues indicating
immanant death should be weighted to drive age.   If the drive
is young, say less than 100 to 200 hours, I would attribute
this to marginal sectors at birth of drive that did not get
caught during drive manufacture and just get them remapped
and move on.  Many drives have a special state when the
hours is <100 in that all raw read errors with more than
N bits in error, before ecc is applied, automatically and
silently add these to the manufactures remap table.  A very
similiar thing is used at drive manufacture time to create
the initial table, basically a "smartctl -t long" that has
tweaked parameters and logging turned off.

If the drive is older than this I would probably attribute
only 2 to a one time event like emergency power off retract,
marginal power situation, or shock or vibrtion during write
and not be too concerned.

If the drive grows additional pending/offline sectors I
would then start to be concerned.  Without any growth
though these are almost always one off events caused
by any of many methods.


-- 
Rod Grimes                                                 rgrimes@freebsd.org