From owner-freebsd-questions@freebsd.org Fri Mar 27 13:05:49 2020 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4A8C6275446 for ; Fri, 27 Mar 2020 13:05:48 +0000 (UTC) (envelope-from feenberg@nber.org) Received: from mail2.nber.org (mail2.nber.org [198.71.6.79]) by mx1.freebsd.org (Postfix) with ESMTP id 48pht265mTz4NWr for ; Fri, 27 Mar 2020 13:05:30 +0000 (UTC) (envelope-from feenberg@nber.org) Received: from mail2.nber.org (mail2.nber.org [198.71.6.79]) by mail2.nber.org (8.15.2/8.15.2) with ESMTPS id 02RD5IBe031111 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 27 Mar 2020 09:05:18 -0400 (EDT) (envelope-from feenberg@nber.org) Date: Fri, 27 Mar 2020 09:05:18 -0400 (EDT) From: Daniel Feenberg To: "Kevin P. Neal" cc: Bob Proulx , freebsd-questions@freebsd.org Subject: Re: drive selection for disk arrays In-Reply-To: <20200327003131.GA41749@neutralgood.org> Message-ID: References: <20200325081814.GK35528@mithril.foucry.net> <713db821-8f69-b41a-75b7-a412a0824c43@holgerdanske.com> <20200326124648725158537@bob.proulx.com> <20200327003131.GA41749@neutralgood.org> User-Agent: Alpine 2.21.9999 (BSF 287 2018-06-16) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-KLMS-Rule-ID: 1 X-KLMS-Message-Action: clean X-KLMS-AntiSpam-Status: not scanned, disabled by settings X-KLMS-AntiSpam-Interceptor-Info: not scanned X-KLMS-AntiPhishing: Clean, 1970/01/01 00:00:00 X-KLMS-AntiVirus: Kaspersky Security 8.0 for Linux Mail Server, version 8.0.1.721, bases: 2020/03/27 05:29:00 #11111534 X-KLMS-AntiVirus-Status: Clean, skipped X-Rspamd-Queue-Id: 48pht265mTz4NWr X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of feenberg@nber.org designates 198.71.6.79 as permitted sender) smtp.mailfrom=feenberg@nber.org X-Spamd-Result: default: False [-6.15 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; FROM_HAS_DN(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[nber.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[79.6.71.198.list.dnswl.org : 127.0.4.2]; RCVD_COUNT_ONE(0.00)[1]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:26287, ipnet:198.71.6.0/23, country:US]; RCVD_TLS_ALL(0.00)[]; IP_SCORE(-3.65)[ip: (-9.58), ipnet: 198.71.6.0/23(-4.79), asn: 26287(-3.83), country: US(-0.05)] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Mar 2020 13:05:49 -0000 On Thu, 26 Mar 2020, Kevin P. Neal wrote: > On Thu, Mar 26, 2020 at 04:37:58PM -0400, Daniel Feenberg wrote: >> >> The disturbing frequency of multiple drives going offline in quick >> succession is, in my view, largely a result of defects being discovered in >> quick succession, rather than occuring in quick succession. If a defect >> occurs in a sector that is rarely visited it can remain hidden for a long >> time. During a resilver that defect will be noticed and the drive failed >> out. I do think that is an overly aggressive action by the resilvering >> process, as that may be the only bad sector, it may be possible to recover >> all the data from the remaining drives (if the first failing drive can read >> the appropriate sector), and that sector may not even be in an active file. > > I thought that got fixed? I thought that a drive wouldn't be failed out > of a pool due to simply bad sectors. Possibly this is only prevented during > a resilver. Can anyone provide an authoritative reference? I would sleep better if I knew this was fixed. When I raised the issue a decade ago, I was told "only an IT admin of low moral character would ask for rsilvering to continue in the presence of an error", or words to that effect. And that was before it took a month to copy out the good data to an alternate store and another month to copy it back. Daniel Feenberg > > Am I remembering wrong? > > -- > Kevin P. Neal http://www.pobox.com/~kpn/ > > "Nonbelievers found it difficult to defend their position in \ > the presense of a working computer." -- a DEC Jensen paper >