From owner-freebsd-questions@freebsd.org Fri Mar 27 09:51:35 2020 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3928326FEEC for ; Fri, 27 Mar 2020 09:51:35 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.126.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.kundenserver.de", Issuer "TeleSec ServerPass Class 2 CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48pcZ52MLCz48BC for ; Fri, 27 Mar 2020 09:51:24 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from r56.edvax.de ([188.102.102.35]) by mrelayeu.kundenserver.de (mreue012 [212.227.15.167]) with ESMTPA (Nemesis) id 1M5jA2-1jBiso3MBm-007AqD; Fri, 27 Mar 2020 10:45:56 +0100 Date: Fri, 27 Mar 2020 10:45:55 +0100 From: Polytropon To: Daniel Feenberg Cc: Bob Proulx , freebsd-questions@freebsd.org Subject: Re: drive selection for disk arrays Message-Id: <20200327104555.1d6d7cd9.freebsd@edvax.de> In-Reply-To: References: <20200325081814.GK35528@mithril.foucry.net> <713db821-8f69-b41a-75b7-a412a0824c43@holgerdanske.com> <20200326124648725158537@bob.proulx.com> Reply-To: Polytropon Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K1:9ZPe62yuzFIWVM/1MJ2wvAlJNYu2LMWnOOgBMYO8rsknfjo2C96 ngOhFWAyrfvOXCy9wU/WBLQSconiUC6ggmT40ChIKwVls12cQuKIxWopc/gL+aWGWcKRAFW FHPEWvhGX/OOvpviHqLvDZF1LLgHNB6Al0ARwq7O94Apg8ZulurqAFnYIHhZmmRdjTv0ScM fys7Do51mMQ8pdmGQHwOg== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:8uVa5CBzpFk=:UWqLd7CI2WV/TVRxr1pQGp 6u5gGUit8wJB5HR86ui7PI1LM3zMlb8xZHhhVn4C8K8cWjMjx2SwWWkPT+OSv6Kln0DVHUgIg dY87n7LIv3QHFWUd3MxTw5Gor7eILjTZ8Du+mO3pn4N625bHv2uhiC31zV2a81CNzbvVlKAOo oe43sBWuTFcXchXtyC7qDqWWfvfWCBqMPRa8avNx5F4CLnzsPVrdqkneorim4Qd5cJd2XEVvP 9OYz4WdlvpumK/0Y3RnqAL7v9+ex0wNRQC9dDeNwbnAxC1Q+qr0hxmGFKZtJKfpcggAja2xLa v8c6uZf9DKd3Pmd3pBnHmYGCwUTB4xaIk6KMPLa9W65ga/fAMeKHo8Xj7yaob63ZgfV1QXFrJ XCFxOD5MhLL2fkiBqx/qeWeTAtq5s6+9UyjeLcemJ7t8Iu/agh49fbU6W7vCeSnopdTRHxEAB Vte+b5BOnB37g2Nwh57Hs/rUgH9uNXYCfklV019eXEqlzgDKc4VoDVpaxr76tZ4wj/FqxVW8a atFSBvJm+1MInCj/R8GEbwe7KGtQczAFaDGmHa1EhXIJoX/Hnaj9HT3LqTi5q2z0CMRwuWqFU Q4ZI0xpT68UuSW66cT2Ub6+3Tvb5/wssX4nIQp9S/o6wCeWC8yt4TtPic1hABlZD9jZ6vaxvj xnhLn2I3MCnC97A9achm6T5Ep2+7+fWcKEV4H5GRyfYT65KP6A9XjJ57iHO7ZlvZt461rzqcL 74RESNkZ13Jg+PuhB9ZfYqmdxYsL53LkYtVchb69RNzEAr7e8i+fjqtiOfjteRAUKTIYuMh1j w0vdJzPrC2QJIQ6ChZcuTX5LSjC7am9em275Up77Zifsa5m6XiG9qL/YEesUTuYQySyw8e/ X-Rspamd-Queue-Id: 48pcZ52MLCz48BC X-Spamd-Bar: ++++ Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of freebsd@edvax.de has no SPF policy when checking 212.227.126.133) smtp.mailfrom=freebsd@edvax.de X-Spamd-Result: default: False [4.81 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; HAS_REPLYTO(0.00)[freebsd@edvax.de]; TO_DN_SOME(0.00)[]; MV_CASE(0.50)[]; HAS_ORG_HEADER(0.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[35.102.102.188.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net : 127.0.0.11]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:8560, ipnet:212.227.0.0/16, country:DE]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; REPLYTO_EQ_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[edvax.de]; AUTH_NA(1.00)[]; NEURAL_SPAM_MEDIUM(0.97)[0.970,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(1.00)[0.998,0]; MID_CONTAINS_FROM(1.00)[]; RCVD_IN_DNSWL_NONE(0.00)[133.126.227.212.list.dnswl.org : 127.0.5.0]; R_SPF_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; IP_SCORE(0.44)[ip: (1.15), ipnet: 212.227.0.0/16(-1.10), asn: 8560(2.17), country: DE(-0.02)] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Mar 2020 09:51:35 -0000 On Thu, 26 Mar 2020 16:37:58 -0400 (EDT), Daniel Feenberg wrote: > > The disturbing frequency of multiple drives going offline in quick > succession is, in my view, largely a result of defects being discovered in > quick succession, rather than occuring in quick succession. If a defect > occurs in a sector that is rarely visited it can remain hidden for a long > time. During a resilver that defect will be noticed and the drive failed > out. I do think that is an overly aggressive action by the resilvering > process, as that may be the only bad sector, it may be possible to recover > all the data from the remaining drives (if the first failing drive can > read the appropriate sector), and that sector may not even be in an active > file. I'd like to mention something in this context: When a drive _reports_ bad sectors, at least in the past it was an indication that it already _has_ lots of them. The drive's firmware will remap bad sectors to spare sectors, so "no error" so far. When errors are being reported "upwards" ("read error" or "write error" visible to the OS), it's a sign that the disk has run out of spare sectors, and the firmware cannot silently remap _new_ bad sectors... Is this still the case with modern drives? How transparently can ZFS handle drive errors when the drives only report the "top results" (i. e., cannot cope with bad sectors internally anymore)? Do SMART tools help here, for example, by reading certain firmware-provided values that indicate how many sectors _actually_ have been marked as "bad sector", remapped internally, and _not_ reported to the controller / disk I/O subsystem / filesystem yet? This should be a good indicator of "will fail soon", so a replacement can be done while no data loss or other problems appears. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...