From owner-freebsd-stable@freebsd.org Wed May 8 14:31:53 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8873E1588B25 for ; Wed, 8 May 2019 14:31:53 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.49.70]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "gromit.dlib.vt.edu", Issuer "Chumby Certificate Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 42B4175676 for ; Wed, 8 May 2019 14:31:52 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from mather.gromit23.net (c-98-244-101-97.hsd1.va.comcast.net [98.244.101.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gromit.dlib.vt.edu (Postfix) with ESMTPSA id 4D1C1153; Wed, 8 May 2019 10:31:49 -0400 (EDT) Content-Type: text/plain; charset=us-ascii; delsp=yes; format=flowed Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: Re: ZFS... From: Paul Mather In-Reply-To: <14ed4197-7af7-f049-2834-1ae6aa3b2ae3@sorbs.net> Date: Wed, 8 May 2019 10:31:48 -0400 Cc: freebsd-stable Content-Transfer-Encoding: 7bit Message-Id: <453BCBAC-A992-4E7D-B2F8-959B5C33510E@gromit.dlib.vt.edu> References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net> <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net> <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net> <20190430102024.E84286@mulder.mintsol.com> <41FA461B-40AE-4D34-B280-214B5C5868B5@punkt.de> <20190506080804.Y87441@mulder.mintsol.com> <08E46EBF-154F-4670-B411-482DCE6F395D@sorbs.net> <33D7EFC4-5C15-4FE0-970B-E6034EF80BEF@gromit.dlib.vt.edu> <26B407D8-3EED-47CA-81F6-A706CF424567@gromit.dlib.vt.edu> <42ba468a-2f87-453c-0c54-32edc98e83b8@sorbs.net> <4A485B46-1C3F-4EE0-8193-ADEB88F322E8@gromit.dlib.vt.edu> <14ed4197-7af7-f049-2834-1ae6aa3b2ae3@sorbs.net> To: Michelle Sullivan X-Mailer: Apple Mail (2.3445.104.8) X-Rspamd-Queue-Id: 42B4175676 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dmarc=fail reason="" header.from=vt.edu (policy=none) X-Spamd-Result: default: False [-2.70 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; NEURAL_HAM_MEDIUM(-1.00)[-0.997,0]; FROM_HAS_DN(0.00)[]; DMARC_POLICY_SOFTFAIL(0.10)[vt.edu : No valid SPF, No valid DKIM,none]; MV_CASE(0.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(-0.87)[ip: (-2.21), ipnet: 128.173.0.0/16(-1.10), asn: 1312(-0.98), country: US(-0.06)]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: chumby.dlib.vt.edu]; RCPT_COUNT_TWO(0.00)[2]; SUBJ_ALL_CAPS(0.45)[6]; R_SPF_NA(0.00)[]; NEURAL_HAM_SHORT(-0.77)[-0.775,0]; RECEIVED_SPAMHAUS_PBL(0.00)[97.101.244.98.zen.spamhaus.org : 127.0.0.10]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:1312, ipnet:128.173.0.0/16, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; FROM_EQ_ENVFROM(0.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 May 2019 14:31:53 -0000 On May 8, 2019, at 9:59 AM, Michelle Sullivan wrote: > Paul Mather wrote: >>> due to lack of space. Interestingly have had another drive die in the >>> array - and it doesn't just have one or two sectors down it has a *lot* >>> - which was not noticed by the original machine - I moved the drive to >>> a byte copier which is where it's reporting 100's of sectors damaged... >>> could this be compounded by zfs/mfi driver/hba not picking up errors >>> like it should? >> >> >> Did you have regular pool scrubs enabled? It would have picked up >> silent data corruption like this. It does for me. > Yes, every month (once a month because, (1) the data doesn't change much > (new data is added, old it not touched), and (2) because to complete it > took 2 weeks.) Do you also run sysutils/smartmontools to monitor S.M.A.R.T. attributes? Although imperfect, it can sometimes signal trouble brewing with a drive (e.g., increasing Reallocated_Sector_Ct and Current_Pending_Sector counts) that can lead to proactive remediation before catastrophe strikes. Unless you have been gathering periodic drive metrics, you have no way of knowing whether these hundreds of bad sectors have happened suddenly or slowly over a period of time. Cheers, Paul.