From owner-freebsd-hackers@freebsd.org Thu Jul 5 01:06:01 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 30EC41031009 for ; Thu, 5 Jul 2018 01:06:01 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 958B27D09A for ; Thu, 5 Jul 2018 01:06:00 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (localhost [127.0.0.1]) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3) with ESMTP id w6515u6K045568; Wed, 4 Jul 2018 18:05:56 -0700 (PDT) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: (from freebsd-rwg@localhost) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3/Submit) id w6515uRD045567; Wed, 4 Jul 2018 18:05:56 -0700 (PDT) (envelope-from freebsd-rwg) From: "Rodney W. Grimes" Message-Id: <201807050105.w6515uRD045567@pdx.rh.CN85.dnsmgr.net> Subject: Re: Confusing smartd messages In-Reply-To: <5B3D6975.2060508@grosbein.net> To: Eugene Grosbein Date: Wed, 4 Jul 2018 18:05:56 -0700 (PDT) CC: George Mitchell , FreeBSD Hackers X-Mailer: ELM [version 2.4ME+ PL121h (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Jul 2018 01:06:01 -0000 [ Charset UTF-8 unsupported, converting... ] > 05.07.2018 7:03, George Mitchell ?????: > > Every thirty minutes, smartd is telling me: > > > > Device: /dev/ada1, 2 Currently unreadable (pending) sectors > > Device: /dev/ada1, 2 Offline uncorrectable sectors > > > > smartctl -a /dev/ada1 seems to be reassuring me that everything is > > fine (SMART overall-health self-assessment test result: PASSED), > > If that would say FAILED, you should be replacing the disk immediately. > PASSED does not mean it has no problems, but problems are not fatal (yet). > > > though it also says: > > > > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always > > - 2 > > 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age > > Offline - 2 > > > > which sounds like it confirms the log message above. The disk is > > part of a zraid pool whose "zpool status" also says everything is > > okay. What's the recommended action at this point? -- George > > You need to force the disk performing rewrite of those two bad sectors. > There is a possibility they are just an example of "soft bad" and in that event > the problem will just disappear without new remaps, that would be best possble case. > > Or two sectors could happen really bad and remap will "fix" (really hide) the problem, > in that case you should be ready for possible increasing number of bad sectors > and have a replacement handy. > > First step is running zpool scrub or even replace the disk and run "dd if=/dev/zero of=/dev/ada1". It would be really nice if we had a way to tell ZFS to do the equivelent of: dd if=/dev/ada1 of=/dev/ada1 bs=128k conv=noerror,sync in some type of lowlevel scrub operation, with proper locking which could easily repair most of these Pending Sector errors, which are actually fairly common in the first couple 100 hours of drive operation. It would of also been nice if the ata standard would of made a way to get the LBA of pending sectors so that a very quick rewrite attempt could be done to fix them. IIRC this info is avalible, but in a vendor specific way. -- Rod Grimes rgrimes@freebsd.org