From owner-freebsd-questions@freebsd.org Wed Aug 3 16:16:57 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7C4AEBADE02 for ; Wed, 3 Aug 2016 16:16:57 +0000 (UTC) (envelope-from freebsd@qeng-ho.org) Received: from bede.qeng-ho.org (bede.qeng-ho.org [217.155.128.241]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "fileserver.home.qeng-ho.org", Issuer "fileserver.home.qeng-ho.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 052891B08 for ; Wed, 3 Aug 2016 16:16:56 +0000 (UTC) (envelope-from freebsd@qeng-ho.org) Received: from arthur.home.qeng-ho.org (arthur.home.qeng-ho.org [172.23.1.2]) by bede.home.qeng-ho.org (8.15.2/8.15.2) with ESMTP id u73FxDg2016286; Wed, 3 Aug 2016 16:59:13 +0100 (BST) (envelope-from freebsd@qeng-ho.org) Subject: Re: Ominous smartd messages .... To: Jon Radel , "Brandon J. Wandersee" References: <117bb75c-aa6a-d562-c971-d0bab742f5ad@radel.com> <8637mmdkah.fsf@WorkBox.Home> <7f1afc31-7eda-ba4c-41ea-046a091d6055@radel.com> Cc: "William A. Mahaffey III" , FreeBSD Questions !!!! From: Arthur Chance Message-ID: Date: Wed, 3 Aug 2016 16:59:13 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <7f1afc31-7eda-ba4c-41ea-046a091d6055@radel.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Aug 2016 16:16:57 -0000 On 03/08/2016 15:09, Jon Radel wrote: > On 8/3/16 10:00 AM, Brandon J. Wandersee wrote: >> >> Jon Radel writes: >> >>> I've read reasonable sounding commentary from people running very, very >>> large collections of hard drives that there is a high enough correlation >>> between this error and the drive going to heck sooner rather than later >>> that they take this as a sign to replace. [Can't find reference right now.] >> >> While there's no way to know from the error message alone just what will >> happen to the disk in the coming days, the general reasoning is this: >> sectors are not physically segregated. They all sit on the same >> platter. Several bad sectors occuring in a short period might be a sign >> of physical fault in the platter, and if that fault is real then stress >> from the platter spinning will likely cause that fault to spread. So >> some people conclude that the appearance of several bad sectors in a >> short period should just be a signal to replace the disk immediately. >> > > If I remember the discussion well enough (sad that I can't find it) my > use of "correlation" was precise. They actually manage enough drives > (thousands) and kept enough records to allow for statistical analysis > which indicate that this smartd error correlates very well with failure > within [I wish I could remember] timeframe. > > Do please excuse the utter lack of footnotes. :-( > I think everyone is probably thinking of Backblaze. This is their latest summary of drive statistics https://www.backblaze.com/blog/hard-drive-failure-rates-q2-2016/ And this is their take on which SMART metrics matter https://www.backblaze.com/blog/hard-drive-smart-stats/ -- Moore's Law of Mad Science: Every eighteen months, the minimum IQ necessary to destroy the world drops by one point.