From owner-freebsd-hackers@freebsd.org Thu Jul 5 17:50:45 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2F9F0104249E for ; Thu, 5 Jul 2018 17:50:45 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lf0-x235.google.com (mail-lf0-x235.google.com [IPv6:2a00:1450:4010:c07::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 888698C786; Thu, 5 Jul 2018 17:50:44 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lf0-x235.google.com with SMTP id y127-v6so7645666lfc.8; Thu, 05 Jul 2018 10:50:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=EZM6wgG2g7aROon7FC4p27kgocy9FoMYF79FKhTiSHY=; b=RQrHXSADJqAFU3CLK+bQpBY7A4E9NHgc3foQlpoUPOxSiW1tmWpIoiO+M6Ui8+WB4L IumqMk/RB4upGaT+RHK4OHnJy9CzZzmsaXhITnFYxuZl7Vyx9fw/y6NqQ5GmdSQZDuYK AYiuaazzttvQW2JkVXEDg6exj0k4QGkO3/6JTjNsNKA8sai85aCfkNk12x8UMcDA1fpQ P8peD0VkZaXWypnOGAHIzTfPhkULDymD29eISv9PoruPCrF0QGFxB9usUuAleXEq5+A8 t540Qi9gpQe8zfLC7lt47F5BFcGzWIieH+ULWkcprhvxVup9fAn3wKWKtvIpxhGWdGBT 3jFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=EZM6wgG2g7aROon7FC4p27kgocy9FoMYF79FKhTiSHY=; b=hidZBx2JnkHpozHvYErH/v9MMoKRYk2PTshE7SGOBzSjOJsYwCdkoKyJTuywjpcV0L Y2QPIPafM/TTvSUmVPWHRxXLIOH/EPxpTXZI7oL+jU0tp8mc5xUZUpfj1Wr0jrxPEm1X oN3G0fUv3P+TBAftT5OO0dzIPtDY1mZDRqdbgA1q3Jv95B/RGXdkzsYS2G6lPISftaT7 rwbnSgf6PgseDxLYNcjeYJziYAm9wJkp7T00YmPCvmOOxURGJwj93jUoMlQ1uSXpkZMh 9s9MRb+6Sy5oAAVYzofryaqdd7g5H34thcz2wZ41TyOXS+0iaNWv8mH53XXtl8MiCZe8 Pb4g== X-Gm-Message-State: APt69E3JSZ7th6K5vmkZRtPhLC6gC9w570BKvRoTHmsQd1dEKsaJvAw+ cjoIpX4mWRqpLKiIvi4ZF8K0NhkAXU7o+Gi3vDQ= X-Google-Smtp-Source: AAOMgpd4X8ohztZfxdnwLdnKr+xjMXP2r+kFg82VhfXxEa9UXqIB7RpM05gRZV6EOfIstmIEorVwDE5RsqLuMIdvoUI= X-Received: by 2002:a19:a417:: with SMTP id q23-v6mr4917625lfc.59.1530813042825; Thu, 05 Jul 2018 10:50:42 -0700 (PDT) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 2002:ab3:1b91:0:0:0:0:0 with HTTP; Thu, 5 Jul 2018 10:50:41 -0700 (PDT) In-Reply-To: <201807051743.w65HhsYb048743@pdx.rh.CN85.dnsmgr.net> References: <51eb8232-49a7-0b3a-2d0f-9882ebfbfa1d@FreeBSD.org> <201807051743.w65HhsYb048743@pdx.rh.CN85.dnsmgr.net> From: Alan Somers Date: Thu, 5 Jul 2018 11:50:41 -0600 X-Google-Sender-Auth: qevRg8-uruFIJhPx2UQs9WE88NU Message-ID: Subject: Re: Confusing smartd messages To: "Rodney W. Grimes" Cc: Lev Serebryakov , FreeBSD Hackers , George Mitchell Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Jul 2018 17:50:45 -0000 On Thu, Jul 5, 2018 at 11:43 AM, Rodney W. Grimes < freebsd-rwg@pdx.rh.cn85.dnsmgr.net> wrote: > > On 05.07.2018 3:03, George Mitchell wrote: > > > > > which sounds like it confirms the log message above. The disk is > > > part of a zraid pool whose "zpool status" also says everything is > > > okay. What's the recommended action at this point? -- George > > > > In my experience it is begin of disk death, even if overall status is > > PASSED. It could work for month or may be half a year after first > > Offline_Uncorrectable is detected (it depends on load), but you best bet > > to replace it ASAP and throw away. > > The appearance of pending or offline sector issues indicating > immanant death should be weighted to drive age. If the drive > is young, say less than 100 to 200 hours, I would attribute > this to marginal sectors at birth of drive that did not get > caught during drive manufacture and just get them remapped > and move on. Many drives have a special state when the > hours is <100 in that all raw read errors with more than > N bits in error, before ecc is applied, automatically and > silently add these to the manufactures remap table. A very > similiar thing is used at drive manufacture time to create > the initial table, basically a "smartctl -t long" that has > tweaked parameters and logging turned off. > The famous Weibull distribution. I believe the Backblaze reports talk about it. > > If the drive is older than this I would probably attribute > only 2 to a one time event like emergency power off retract, > marginal power situation, or shock or vibrtion during write > and not be too concerned. > > If the drive grows additional pending/offline sectors I > would then start to be concerned. Without any growth > though these are almost always one off events caused > by any of many methods. > The OP hasn't watched 100,000 drives age. Backblaze has. That's why my advice is to replace them according to the failure indicators reported by Backblaze or the manufacturer, without reading too much into the meaning. -Alan