From owner-freebsd-stable@freebsd.org Wed May 8 15:14:15 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 31F8D1589B3E for ; Wed, 8 May 2019 15:14:15 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40]) by mx1.freebsd.org (Postfix) with ESMTP id B599D76E45 for ; Wed, 8 May 2019 15:14:13 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII; format=flowed Received: from isux.com (gate.mhix.org [203.206.128.220]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0PR600186YYXZI40@hades.sorbs.net> for freebsd-stable@freebsd.org; Wed, 08 May 2019 08:28:11 -0700 (PDT) Subject: Re: ZFS... To: Paul Mather Cc: freebsd-stable References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net> <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net> <20190430102024.E84286@mulder.mintsol.com> <41FA461B-40AE-4D34-B280-214B5C5868B5@punkt.de> <20190506080804.Y87441@mulder.mintsol.com> <08E46EBF-154F-4670-B411-482DCE6F395D@sorbs.net> <33D7EFC4-5C15-4FE0-970B-E6034EF80BEF@gromit.dlib.vt.edu> <26B407D8-3EED-47CA-81F6-A706CF424567@gromit.dlib.vt.edu> <42ba468a-2f87-453c-0c54-32edc98e83b8@sorbs.net> <4A485B46-1C3F-4EE0-8193-ADEB88F322E8@gromit.dlib.vt.edu> <14ed4197-7af7-f049-2834-1ae6aa3b2ae3@sorbs.net> <453BCBAC-A992-4E7D-B2F8-959B5C33510E@gromit.dlib.vt.edu> From: Michelle Sullivan Message-id: <92330c95-7348-c5a2-9c13-f4cbc99bc649@sorbs.net> Date: Thu, 09 May 2019 01:14:08 +1000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:51.0) Gecko/20100101 Firefox/51.0 SeaMonkey/2.48 In-reply-to: <453BCBAC-A992-4E7D-B2F8-959B5C33510E@gromit.dlib.vt.edu> X-Rspamd-Queue-Id: B599D76E45 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net X-Spamd-Result: default: False [-2.43 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-0.97)[-0.973,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[sorbs.net]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: battlestar.sorbs.net]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0]; SUBJ_ALL_CAPS(0.45)[6]; IP_SCORE(-0.34)[ip: (-0.86), ipnet: 72.12.192.0/19(-0.45), asn: 11114(-0.35), country: US(-0.06)]; NEURAL_HAM_SHORT(-0.85)[-0.850,0]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US]; MID_RHS_MATCH_FROM(0.00)[]; CTE_CASE(0.50)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 May 2019 15:14:15 -0000 Paul Mather wrote: > On May 8, 2019, at 9:59 AM, Michelle Sullivan wrote: > >> Paul Mather wrote: >>>> due to lack of space. Interestingly have had another drive die in >>>> the array - and it doesn't just have one or two sectors down it has >>>> a *lot* - which was not noticed by the original machine - I moved >>>> the drive to a byte copier which is where it's reporting 100's of >>>> sectors damaged... could this be compounded by zfs/mfi driver/hba >>>> not picking up errors like it should? >>> >>> >>> Did you have regular pool scrubs enabled? It would have picked up >>> silent data corruption like this. It does for me. >> Yes, every month (once a month because, (1) the data doesn't change >> much (new data is added, old it not touched), and (2) because to >> complete it took 2 weeks.) > > > Do you also run sysutils/smartmontools to monitor S.M.A.R.T. > attributes? Although imperfect, it can sometimes signal trouble > brewing with a drive (e.g., increasing Reallocated_Sector_Ct and > Current_Pending_Sector counts) that can lead to proactive remediation > before catastrophe strikes. not Automatically > > Unless you have been gathering periodic drive metrics, you have no way > of knowing whether these hundreds of bad sectors have happened > suddenly or slowly over a period of time. no, it something i have thought about but been unable to spend the time on. -- Michelle Sullivan http://www.mhix.org/