From owner-freebsd-hackers@freebsd.org  Thu Jul  5 17:50:45 2018
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2F9F0104249E
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  5 Jul 2018 17:50:45 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: from mail-lf0-x235.google.com (mail-lf0-x235.google.com
 [IPv6:2a00:1450:4010:c07::235])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 888698C786;
 Thu,  5 Jul 2018 17:50:44 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: by mail-lf0-x235.google.com with SMTP id y127-v6so7645666lfc.8;
 Thu, 05 Jul 2018 10:50:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc;
 bh=EZM6wgG2g7aROon7FC4p27kgocy9FoMYF79FKhTiSHY=;
 b=RQrHXSADJqAFU3CLK+bQpBY7A4E9NHgc3foQlpoUPOxSiW1tmWpIoiO+M6Ui8+WB4L
 IumqMk/RB4upGaT+RHK4OHnJy9CzZzmsaXhITnFYxuZl7Vyx9fw/y6NqQ5GmdSQZDuYK
 AYiuaazzttvQW2JkVXEDg6exj0k4QGkO3/6JTjNsNKA8sai85aCfkNk12x8UMcDA1fpQ
 P8peD0VkZaXWypnOGAHIzTfPhkULDymD29eISv9PoruPCrF0QGFxB9usUuAleXEq5+A8
 t540Qi9gpQe8zfLC7lt47F5BFcGzWIieH+ULWkcprhvxVup9fAn3wKWKtvIpxhGWdGBT
 3jFQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
 :date:message-id:subject:to:cc;
 bh=EZM6wgG2g7aROon7FC4p27kgocy9FoMYF79FKhTiSHY=;
 b=hidZBx2JnkHpozHvYErH/v9MMoKRYk2PTshE7SGOBzSjOJsYwCdkoKyJTuywjpcV0L
 Y2QPIPafM/TTvSUmVPWHRxXLIOH/EPxpTXZI7oL+jU0tp8mc5xUZUpfj1Wr0jrxPEm1X
 oN3G0fUv3P+TBAftT5OO0dzIPtDY1mZDRqdbgA1q3Jv95B/RGXdkzsYS2G6lPISftaT7
 rwbnSgf6PgseDxLYNcjeYJziYAm9wJkp7T00YmPCvmOOxURGJwj93jUoMlQ1uSXpkZMh
 9s9MRb+6Sy5oAAVYzofryaqdd7g5H34thcz2wZ41TyOXS+0iaNWv8mH53XXtl8MiCZe8
 Pb4g==
X-Gm-Message-State: APt69E3JSZ7th6K5vmkZRtPhLC6gC9w570BKvRoTHmsQd1dEKsaJvAw+
 cjoIpX4mWRqpLKiIvi4ZF8K0NhkAXU7o+Gi3vDQ=
X-Google-Smtp-Source: AAOMgpd4X8ohztZfxdnwLdnKr+xjMXP2r+kFg82VhfXxEa9UXqIB7RpM05gRZV6EOfIstmIEorVwDE5RsqLuMIdvoUI=
X-Received: by 2002:a19:a417:: with SMTP id
 q23-v6mr4917625lfc.59.1530813042825; 
 Thu, 05 Jul 2018 10:50:42 -0700 (PDT)
MIME-Version: 1.0
Sender: asomers@gmail.com
Received: by 2002:ab3:1b91:0:0:0:0:0 with HTTP;
 Thu, 5 Jul 2018 10:50:41 -0700 (PDT)
In-Reply-To: <201807051743.w65HhsYb048743@pdx.rh.CN85.dnsmgr.net>
References: <51eb8232-49a7-0b3a-2d0f-9882ebfbfa1d@FreeBSD.org>
 <201807051743.w65HhsYb048743@pdx.rh.CN85.dnsmgr.net>
From: Alan Somers <asomers@freebsd.org>
Date: Thu, 5 Jul 2018 11:50:41 -0600
X-Google-Sender-Auth: qevRg8-uruFIJhPx2UQs9WE88NU
Message-ID: <CAOtMX2hWeUQLbfRgh_qAXMyQESKcf-ntRtW=M1huTWAQta9gKA@mail.gmail.com>
Subject: Re: Confusing smartd messages
To: "Rodney W. Grimes" <freebsd-rwg@pdx.rh.cn85.dnsmgr.net>
Cc: Lev Serebryakov <lev@freebsd.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>, 
 George Mitchell <george+freebsd@m5p.com>
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.27
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Jul 2018 17:50:45 -0000

On Thu, Jul 5, 2018 at 11:43 AM, Rodney W. Grimes <
freebsd-rwg@pdx.rh.cn85.dnsmgr.net> wrote:

> > On 05.07.2018 3:03, George Mitchell wrote:
> >
> > > which sounds like it confirms the log message above.  The disk is
> > > part of a zraid pool whose "zpool status" also says everything is
> > > okay.  What's the recommended action at this point?     -- George
> >
> >  In my experience it is begin of disk death, even if overall status is
> > PASSED. It could work for month or may be half a year after first
> > Offline_Uncorrectable is detected (it depends on load), but you best bet
> > to replace it ASAP and throw away.
>
> The appearance of pending or offline sector issues indicating
> immanant death should be weighted to drive age.   If the drive
> is young, say less than 100 to 200 hours, I would attribute
> this to marginal sectors at birth of drive that did not get
> caught during drive manufacture and just get them remapped
> and move on.  Many drives have a special state when the
> hours is <100 in that all raw read errors with more than
> N bits in error, before ecc is applied, automatically and
> silently add these to the manufactures remap table.  A very
> similiar thing is used at drive manufacture time to create
> the initial table, basically a "smartctl -t long" that has
> tweaked parameters and logging turned off.
>

The famous Weibull distribution.  I believe the Backblaze reports talk
about it.


>
> If the drive is older than this I would probably attribute
> only 2 to a one time event like emergency power off retract,
> marginal power situation, or shock or vibrtion during write
> and not be too concerned.
>
> If the drive grows additional pending/offline sectors I
> would then start to be concerned.  Without any growth
> though these are almost always one off events caused
> by any of many methods.
>

The OP hasn't watched 100,000 drives age.  Backblaze has.  That's why my
advice is to replace them according to the failure indicators reported by
Backblaze or the manufacturer, without reading too much into the meaning.

-Alan