Date: Thu, 5 Jul 2018 10:39:45 -0600 From: Alan Somers <asomers@freebsd.org> To: Stefan Blachmann <sblachmann@gmail.com> Cc: Wojciech Puchar <wojtek@puchar.net>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, George Mitchell <george+freebsd@m5p.com>, Lev Serebryakov <lev@freebsd.org> Subject: Re: Confusing smartd messages Message-ID: <CAOtMX2gG48jzWkPg3kGpSVDC89KY14ta3p-U%2BO5yExHZJfNL7w@mail.gmail.com> In-Reply-To: <CACc-My36jbL=WWpxOB24D_YLDMofSHAk9JgrP86LKd4MEct1mg@mail.gmail.com> References: <dfccd275-954c-11da-1790-e75878f89ad1@m5p.com> <51eb8232-49a7-0b3a-2d0f-9882ebfbfa1d@FreeBSD.org> <alpine.BSF.2.20.1807051642090.17082@puchar.net> <CACc-My36jbL=WWpxOB24D_YLDMofSHAk9JgrP86LKd4MEct1mg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
My advice to the OP is to chill out. SMART is inconsistently implemented by different drive vendors and it's very hard to interpret its output. I would only recommend replacing a drive based on its SMART status for two reasons: 1) The drive is under warranty and the vendor agrees to a free replacement based on the SMART output alone. The vendors know the meaning of their own SMART fields better than you do. 2) A large statistical dataset shows that this particular SMART field is correlated with early failure, for your model of hard drive (or at least a similar model). Backblaze maintains one such dataset, which they periodically publish on their blog. There are a few other outdated datasets in the academic literature. One from AOL, and several from supercomputer operators. But Backblaze's is the best because a) it's current, b) it's large, and b) they have a very diverse set of hard drives. Still, even Backblaze can sound a little superstitious (they replace an entire chassis once several of its drives have had SMART problems). https://www.backblaze.com/blog/hard-drive-reliability-q1-2015/ If the drive is not RMAable and you're nervous because you love your data, then you might consider setting up a hotspare. zfsd(8) will activate it the moment that one of your current drives fails. You can even configure the hotspare to be spun down most of the time so it won't be affected by the mechanical shocks or regular wear that the live drives endure. Rewriting suspicious sectors is useless in this day and age. HDDs and SSDs already do it internally and have for years. Even healthy sectors get rewritten every now and then due to the adjacent track interference problem. About the only kind of problem that could develop on the track that the HDD/SSD won't fix itself would be a checksum error. Those are very rare, and ZFS will fix them immediately. -Alan "too well versed in hard drive reliability for my own good" Somers On Thu, Jul 5, 2018 at 10:11 AM, Stefan Blachmann <sblachmann@gmail.com> wrote: > Another problem issue is that flash memories also exhibit the charge > drain problem. > They cannot be read indefinitely without occasional rewrite, as every > read drains a minuscule amount of the charge. > > I often wished I knew of some OS/driver function/mechanism which can > rewrite respective refresh media on a mounted+running system and could > be, for example, run via cron. > > Such would not only be very useful to fix pending sectors without > stopping a running machine, but also for keeping embedded machines' > flash memories reliably charged over the years. > > > > On 7/5/18, Wojciech Puchar <wojtek@puchar.net> wrote: > >>> okay. What's the recommended action at this point? -- George > >> > >> In my experience it is begin of disk death, even if overall status is > >> PASSED. It could work for month or may be half a year after first > >> Offline_Uncorrectable is detected (it depends on load), but you best bet > >> to replace it ASAP and throw away. > > well my disk had this and live happily for 3 years. > > > > It JUST means that some sectors are unreadable which may be a reason that > > at some some write got wrong because of hardware problem. But this > problem > > may be - and possibly were - powerdown while writing, or power spike. > > > > the media itself could be fine. the best action in such case is to force > > rewrite whole drive with some data. > > > > with gmirror it is as easy as first checking second drive for no errors, > > then forcing remirror. > > _______________________________________________ > > freebsd-hackers@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@ > freebsd.org" > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2gG48jzWkPg3kGpSVDC89KY14ta3p-U%2BO5yExHZJfNL7w>