Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 5 Jul 2018 10:43:54 -0700 (PDT)
From:      "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>
To:        lev@freebsd.org
Cc:        George Mitchell <george+freebsd@m5p.com>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Confusing smartd messages
Message-ID:  <201807051743.w65HhsYb048743@pdx.rh.CN85.dnsmgr.net>
In-Reply-To: <51eb8232-49a7-0b3a-2d0f-9882ebfbfa1d@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> On 05.07.2018 3:03, George Mitchell wrote:
> 
> > which sounds like it confirms the log message above.  The disk is
> > part of a zraid pool whose "zpool status" also says everything is
> > okay.  What's the recommended action at this point?     -- George
> 
>  In my experience it is begin of disk death, even if overall status is
> PASSED. It could work for month or may be half a year after first
> Offline_Uncorrectable is detected (it depends on load), but you best bet
> to replace it ASAP and throw away.

The appearance of pending or offline sector issues indicating
immanant death should be weighted to drive age.   If the drive
is young, say less than 100 to 200 hours, I would attribute
this to marginal sectors at birth of drive that did not get
caught during drive manufacture and just get them remapped
and move on.  Many drives have a special state when the
hours is <100 in that all raw read errors with more than
N bits in error, before ecc is applied, automatically and
silently add these to the manufactures remap table.  A very
similiar thing is used at drive manufacture time to create
the initial table, basically a "smartctl -t long" that has
tweaked parameters and logging turned off.

If the drive is older than this I would probably attribute
only 2 to a one time event like emergency power off retract,
marginal power situation, or shock or vibrtion during write
and not be too concerned.

If the drive grows additional pending/offline sectors I
would then start to be concerned.  Without any growth
though these are almost always one off events caused
by any of many methods.


-- 
Rod Grimes                                                 rgrimes@freebsd.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201807051743.w65HhsYb048743>