From owner-freebsd-hackers@freebsd.org Thu Jul 5 16:39:49 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8683B103BAA7 for ; Thu, 5 Jul 2018 16:39:49 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lj1-x229.google.com (mail-lj1-x229.google.com [IPv6:2a00:1450:4864:20::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CACFC8737D; Thu, 5 Jul 2018 16:39:48 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lj1-x229.google.com with SMTP id 1-v6so7123847ljv.9; Thu, 05 Jul 2018 09:39:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=uNF0ZV1vdtGn8VZs/c5Xp/ShdrKC5/kJoWnZ9H3capY=; b=ZManTTdUP1XtRKc5Zxik400QcV5DOml+7RY1TZMNjZTFwWbS0oDf1xns/TPCb5aKKV Auu9BN55zYwghwzcupCuOhbHA9yiuusPo5GBpNbQNpYLD2umk9sSLlghag72oMn0sGmk JkHY7NXm3KUntAxB35WYfDm8kHHrdwAUfd/xSAGxnghQU+8VX4F6EfXnw57KFEUvb+6+ SrhahWb8XaY/HPRZwZo81F4M4xVhCMaSuKBx7QEP7qFJREk7U9zh641McQ2GWtIvvTo8 Gv7NG6+rMy9jKlRQ6m+Yubf2TkJq5i/4BlahgGtPDFlo2ZRq7KztxS1NujAySzhAY8o6 Tl4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=uNF0ZV1vdtGn8VZs/c5Xp/ShdrKC5/kJoWnZ9H3capY=; b=VJHBSQOWMTrMk5AUKziVaAwmdYwv0gFRzi+vrWtwHf1H33Us8V34Q6dwtV1RiNGdtr gpzSWcc9TZw1oYRQzGU0MdQACdvb10BETrV+/mwEg65qW/ZkOIuxx1N09nZ9DUjNXDD3 QHKFAqG+Uic9PmltR7S70cqJAlVRSnfgKVkC+f/krtCMhSqLJPc0StuD4X4Ym13d9qEJ GPytMUZyvb7NoN2HMGotuBzW08BG1MYRlFQRt4JaD0rV7SvSFIX6YPVmZN3tbr1plZnG 0qEYNw34m4KZmDQCdgb63kHj7cS6C7TOrFIN7G/lA8BkL3SsRh1WZizL3X74KS1YHak3 j0Aw== X-Gm-Message-State: APt69E1PSXbjUPQgEBLh5tEKth3qiWpqnOYtVSw3UhvzjIVGaZpGKROi UIgJAwJBJT3bcXvRUIfYAWCaSXYA0mSoW9pZ44w= X-Google-Smtp-Source: AAOMgpcaRps6XYQIKOhbAxZfyJoZ2CvJ+z4PEY5MVsg1OZ7H1HpnemeyC/N5+yvEhAUQ1TUBZKQ60van7p9GsXvl+Zo= X-Received: by 2002:a2e:3101:: with SMTP id x1-v6mr4628752ljx.8.1530808786267; Thu, 05 Jul 2018 09:39:46 -0700 (PDT) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 2002:ab3:1b91:0:0:0:0:0 with HTTP; Thu, 5 Jul 2018 09:39:45 -0700 (PDT) In-Reply-To: References: <51eb8232-49a7-0b3a-2d0f-9882ebfbfa1d@FreeBSD.org> From: Alan Somers Date: Thu, 5 Jul 2018 10:39:45 -0600 X-Google-Sender-Auth: 3mAMel_QA4gxlLQ0g7svzEkKZkc Message-ID: Subject: Re: Confusing smartd messages To: Stefan Blachmann Cc: Wojciech Puchar , FreeBSD Hackers , George Mitchell , Lev Serebryakov Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Jul 2018 16:39:49 -0000 My advice to the OP is to chill out. SMART is inconsistently implemented by different drive vendors and it's very hard to interpret its output. I would only recommend replacing a drive based on its SMART status for two reasons: 1) The drive is under warranty and the vendor agrees to a free replacement based on the SMART output alone. The vendors know the meaning of their own SMART fields better than you do. 2) A large statistical dataset shows that this particular SMART field is correlated with early failure, for your model of hard drive (or at least a similar model). Backblaze maintains one such dataset, which they periodically publish on their blog. There are a few other outdated datasets in the academic literature. One from AOL, and several from supercomputer operators. But Backblaze's is the best because a) it's current, b) it's large, and b) they have a very diverse set of hard drives. Still, even Backblaze can sound a little superstitious (they replace an entire chassis once several of its drives have had SMART problems). https://www.backblaze.com/blog/hard-drive-reliability-q1-2015/ If the drive is not RMAable and you're nervous because you love your data, then you might consider setting up a hotspare. zfsd(8) will activate it the moment that one of your current drives fails. You can even configure the hotspare to be spun down most of the time so it won't be affected by the mechanical shocks or regular wear that the live drives endure. Rewriting suspicious sectors is useless in this day and age. HDDs and SSDs already do it internally and have for years. Even healthy sectors get rewritten every now and then due to the adjacent track interference problem. About the only kind of problem that could develop on the track that the HDD/SSD won't fix itself would be a checksum error. Those are very rare, and ZFS will fix them immediately. -Alan "too well versed in hard drive reliability for my own good" Somers On Thu, Jul 5, 2018 at 10:11 AM, Stefan Blachmann wrote: > Another problem issue is that flash memories also exhibit the charge > drain problem. > They cannot be read indefinitely without occasional rewrite, as every > read drains a minuscule amount of the charge. > > I often wished I knew of some OS/driver function/mechanism which can > rewrite respective refresh media on a mounted+running system and could > be, for example, run via cron. > > Such would not only be very useful to fix pending sectors without > stopping a running machine, but also for keeping embedded machines' > flash memories reliably charged over the years. > > > > On 7/5/18, Wojciech Puchar wrote: > >>> okay. What's the recommended action at this point? -- George > >> > >> In my experience it is begin of disk death, even if overall status is > >> PASSED. It could work for month or may be half a year after first > >> Offline_Uncorrectable is detected (it depends on load), but you best bet > >> to replace it ASAP and throw away. > > well my disk had this and live happily for 3 years. > > > > It JUST means that some sectors are unreadable which may be a reason that > > at some some write got wrong because of hardware problem. But this > problem > > may be - and possibly were - powerdown while writing, or power spike. > > > > the media itself could be fine. the best action in such case is to force > > rewrite whole drive with some data. > > > > with gmirror it is as easy as first checking second drive for no errors, > > then forcing remirror. > > _______________________________________________ > > freebsd-hackers@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@ > freebsd.org" > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >