Date: Fri, 9 Nov 2012 13:23:37 -0800 From: Kevin Oberman <kob6558@gmail.com> To: "Lucas B. Cohen" <lbc@bnrlabs.com> Cc: "H. Ingow" <hingow@gmail.com>, stable@freebsd.org Subject: Re: smartctl question Message-ID: <CAN6yY1sPqusGr-oAKkzz88fJQe7p2j6WTbj0Rho=d_7ArPeEoA@mail.gmail.com> In-Reply-To: <509CED3E.8090103@bnrlabs.com> References: <20121109111843.GA25461@tunchi> <509CED3E.8090103@bnrlabs.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Nov 9, 2012 at 3:47 AM, Lucas B. Cohen <lbc@bnrlabs.com> wrote: > Hi, > > On 2012.11.09 12:18, H. Ingow wrote: >> >> Hi all, >> >> one single disk in a zfs mirror failed permanently throwing errors like >> kernel: (ada5:ata10:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 84 >> (ICRC ABRT ) and alike. >> >> The pool itself continued working degraded, smartctl showed a very high >> 199 UDMA_CRC_Error_Count value, which to my knowledge may indicate a >> broken cable, in this case indeed a cable replacement solved the >> problem, the pool resilvered and all is fine. >> >> Still smartctl -a displays a value of 199 UDMA_CRC_Error_Count I reckon >> to be way too high, though ( > 3900 ) . >> So is this value now including errors from previous broken cable ? > > I'm pretty sure it is. I don't think SMART attributes can vary in value > both up and down ; they seem to me like they're counters that can only > get incremented. > >> In other words, when, if at all, is the cache smartmontools read from >> flushed and values are to be taken as of the status after fixing a >> hardware problem but not swapping the disk ? > So, in my opinion no. This is a problem with S.M.A.R.T. All stats are stored by the drive in the drive and the assumption is that all of the errors are caused by problems in the drive (and usually are). But when they are from a cable problem, the drive never sees the problem as "gone", so the counters never reset. As long as you remember that you had a cable problem with that drive and that the count was 199, you can discount it or recognize a problem down the road if it starts increasing. I'd put it on a label that can be stuck to the drive as a last reminder that the count is "off by 199". By the way, I believe that some stats do go up and down, but not counters. Like in snmp, counters are never supposed to be reset or resettable. -- R. Kevin Oberman, Network Engineer E-mail: kob6558@gmail.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAN6yY1sPqusGr-oAKkzz88fJQe7p2j6WTbj0Rho=d_7ArPeEoA>