From owner-freebsd-questions@freebsd.org Thu Mar 26 20:56:38 2020 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A460D2616EB for ; Thu, 26 Mar 2020 20:56:38 +0000 (UTC) (envelope-from feenberg@nber.org) Received: from mail2.nber.org (mail2.nber.org [198.71.6.79]) by mx1.freebsd.org (Postfix) with ESMTP id 48pHMr3RYkz4GCr for ; Thu, 26 Mar 2020 20:56:23 +0000 (UTC) (envelope-from feenberg@nber.org) Received: from mail2.nber.org (mail2.nber.org [198.71.6.79]) by mail2.nber.org (8.15.2/8.15.2) with ESMTPS id 02QKbwYv061912 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 26 Mar 2020 16:37:59 -0400 (EDT) (envelope-from feenberg@nber.org) Date: Thu, 26 Mar 2020 16:37:58 -0400 (EDT) From: Daniel Feenberg To: Bob Proulx cc: freebsd-questions@freebsd.org Subject: Re: drive selection for disk arrays In-Reply-To: <20200326124648725158537@bob.proulx.com> Message-ID: References: <20200325081814.GK35528@mithril.foucry.net> <713db821-8f69-b41a-75b7-a412a0824c43@holgerdanske.com> <20200326124648725158537@bob.proulx.com> User-Agent: Alpine 2.21.9999 (BSF 287 2018-06-16) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-KLMS-Rule-ID: 1 X-KLMS-Message-Action: clean X-KLMS-AntiSpam-Status: not scanned, disabled by settings X-KLMS-AntiSpam-Interceptor-Info: not scanned X-KLMS-AntiPhishing: Clean, 1970/01/01 00:00:00 X-KLMS-AntiVirus: Kaspersky Security 8.0 for Linux Mail Server, version 8.0.1.721, bases: 2020/03/26 15:07:00 #11108036 X-KLMS-AntiVirus-Status: Clean, skipped X-Rspamd-Queue-Id: 48pHMr3RYkz4GCr X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of feenberg@nber.org designates 198.71.6.79 as permitted sender) smtp.mailfrom=feenberg@nber.org X-Spamd-Result: default: False [-6.14 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[nber.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[79.6.71.198.list.dnswl.org : 127.0.4.2]; RCPT_COUNT_TWO(0.00)[2]; RCVD_COUNT_ONE(0.00)[1]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:26287, ipnet:198.71.6.0/23, country:US]; RCVD_TLS_ALL(0.00)[]; IP_SCORE(-3.64)[ip: (-9.55), ipnet: 198.71.6.0/23(-4.78), asn: 26287(-3.82), country: US(-0.05)] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Mar 2020 20:56:38 -0000 The disturbing frequency of multiple drives going offline in quick succession is, in my view, largely a result of defects being discovered in quick succession, rather than occuring in quick succession. If a defect occurs in a sector that is rarely visited it can remain hidden for a long time. During a resilver that defect will be noticed and the drive failed out. I do think that is an overly aggressive action by the resilvering process, as that may be the only bad sector, it may be possible to recover all the data from the remaining drives (if the first failing drive can read the appropriate sector), and that sector may not even be in an active file. This issue makes scrubbing particularly important, especially in this era of very large filesystems that can take days or weeks to restore.