From owner-freebsd-stable@freebsd.org Wed May 8 16:32:01 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5F4DA158CC28 for ; Wed, 8 May 2019 16:32:01 +0000 (UTC) (envelope-from wfc@mintsol.com) Received: from scully.mintsol.com (scully.mintsol.com [199.182.77.206]) by mx1.freebsd.org (Postfix) with ESMTP id 7F4768459A for ; Wed, 8 May 2019 16:32:00 +0000 (UTC) (envelope-from wfc@mintsol.com) Received: from mintsol.com (officecc.mintsol.com [96.85.114.33]) by scully.mintsol.com with esmtp; Wed, 08 May 2019 12:31:54 -0400 id 00ACDC53.000000005CD3047A.000074DC Received: from localhost (localhost [127.0.0.1]) (IDENT: uid 1002) by mintsol.com with esmtp; Wed, 08 May 2019 12:31:54 -0400 id 00000839.5CD3047A.000105F4 Date: Wed, 8 May 2019 12:31:54 -0400 (EDT) From: Walter Cramer To: Paul Mather cc: Michelle Sullivan , freebsd-stable Subject: Re: ZFS... In-Reply-To: <453BCBAC-A992-4E7D-B2F8-959B5C33510E@gromit.dlib.vt.edu> Message-ID: <20190508104026.C58567@mulder.mintsol.com> References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net> <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net> <20190430102024.E84286@mulder.mintsol.com> <41FA461B-40AE-4D34-B280-214B5C5868B5@punkt.de> <20190506080804.Y87441@mulder.mintsol.com> <08E46EBF-154F-4670-B411-482DCE6F395D@sorbs.net> <33D7EFC4-5C15-4FE0-970B-E6034EF80BEF@gromit.dlib.vt.edu> <26B407D8-3EED-47CA-81F6-A706CF424567@gromit.dlib.vt.edu> <42ba468a-2f87-453c-0c54-32edc98e83b8@sorbs.net> <4A485B46-1C3F-4EE0-8193-ADEB88F322E8@gromit.dlib.vt.edu> <14ed4197-7af7-f049-2834-1ae6aa3b2ae3@sorbs.net> <453BCBAC-A992-4E7D-B2F8-959B5C33510E@gromit.dlib.vt.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 7F4768459A X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of wfc@mintsol.com designates 199.182.77.206 as permitted sender) smtp.mailfrom=wfc@mintsol.com X-Spamd-Result: default: False [-4.79 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+a:scully.mintsol.com]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[mintsol.com]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[bmx01.pofox.com]; NEURAL_HAM_SHORT(-0.95)[-0.953,0]; SUBJ_ALL_CAPS(0.45)[6]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:22768, ipnet:199.182.77.0/24, country:US]; IP_SCORE(-2.57)[ip: (-6.74), ipnet: 199.182.77.0/24(-3.37), asn: 22768(-2.70), country: US(-0.06)] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 May 2019 16:32:01 -0000 On Wed, 8 May 2019, Paul Mather wrote: > On May 8, 2019, at 9:59 AM, Michelle Sullivan wrote: > >> Paul Mather wrote: >>>> due to lack of space. Interestingly have had another drive die in the >>>> array - and it doesn't just have one or two sectors down it has a *lot* - >>>> which was not noticed by the original machine - I moved the drive to a >>>> byte copier which is where it's reporting 100's of sectors damaged... >>>> could this be compounded by zfs/mfi driver/hba not picking up errors like >>>> it should? >>> >>> >>> Did you have regular pool scrubs enabled? It would have picked up silent >>> data corruption like this. It does for me. >> Yes, every month (once a month because, (1) the data doesn't change much >> (new data is added, old it not touched), and (2) because to complete it >> took 2 weeks.) > > > Do you also run sysutils/smartmontools to monitor S.M.A.R.T. attributes? > Although imperfect, it can sometimes signal trouble brewing with a drive > (e.g., increasing Reallocated_Sector_Ct and Current_Pending_Sector counts) > that can lead to proactive remediation before catastrophe strikes. > > Unless you have been gathering periodic drive metrics, you have no way of > knowing whether these hundreds of bad sectors have happened suddenly or > slowly over a period of time. > +1 Use `smartctl` from a cron script to do regular (say, weekly) *long* self-tests of hard drives, and also log (say, daily) all the SMART information from each drive. Then if a drive fails, you can at least check the logs for whether SMART noticed symptoms, and (if so) for other drives with symptoms. Or enhance this with a slightly longer script, which watches the logs for symptoms, and alerts you. (My experience is that SMART's *long* self-test checks the entire disk for read errors, without neither downside of `zpool scrub` - it does a fast, sequential read of the HD, including free space. That makes it a nice test for failing disk hardware; not a replacement for `zpool scrub`.) > Cheers, > > Paul. > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"