From owner-freebsd-questions@freebsd.org Mon Feb 15 20:56:40 2021 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 40652539CFA for ; Mon, 15 Feb 2021 20:56:40 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Received: from holgerdanske.com (holgerdanske.com [184.105.128.27]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "www.holgerdanske.com", Issuer "www.holgerdanske.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Dfbxg2TLSz3NQH for ; Mon, 15 Feb 2021 20:56:39 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Received: from 99.100.19.101 (99-100-19-101.lightspeed.frokca.sbcglobal.net [99.100.19.101]) by holgerdanske.com with ESMTPSA (TLS_AES_128_GCM_SHA256:TLSv1.3:Kx=any:Au=any:Enc=AESGCM(128):Mac=AEAD) (SMTP-AUTH username dpchrist@holgerdanske.com, mechanism PLAIN) for ; Mon, 15 Feb 2021 12:56:32 -0800 Subject: Re: zpool CKSUM 0 --> 1 while resilvering To: freebsd-questions@freebsd.org References: <20210215131139.c3ad9f9c9f907ee5b058fd37@3dresearch.com> From: David Christensen Message-ID: Date: Mon, 15 Feb 2021 12:56:31 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20210215131139.c3ad9f9c9f907ee5b058fd37@3dresearch.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4Dfbxg2TLSz3NQH X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of dpchrist@holgerdanske.com has no SPF policy when checking 184.105.128.27) smtp.mailfrom=dpchrist@holgerdanske.com X-Spamd-Result: default: False [-2.10 / 15.00]; RCVD_TLS_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[184.105.128.27:from]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; TO_DN_NONE(0.00)[]; AUTH_NA(1.00)[]; RCPT_COUNT_ONE(0.00)[1]; SPAMHAUS_ZRD(0.00)[184.105.128.27:from:127.0.2.255]; ARC_NA(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_NA(0.00)[holgerdanske.com]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_SPF_NA(0.00)[no SPF record]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:6939, ipnet:184.104.0.0/15, country:US]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-questions] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Feb 2021 20:56:40 -0000 On 2021-02-15 10:11, Janos Dohanics wrote: > Hello, > > I had to replace a hard drive which has failed a smart test. As the > replacement hard drive was being resilvered, I have peridically checked > the progress. > > Initially, everything looked fine: > > # zpool status > pool: zroot > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Sun Feb 14 20:44:20 2021 > 1.77T scanned at 291M/s, 179G issued at 28.7M/s, 1.97T total > 179G resilvered, 8.88% done, 0 days 18:12:23 to go > config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > ada0p3 ONLINE 0 0 0 > replacing-1 ONLINE 0 0 0 > ada1p3 ONLINE 0 0 0 > ada3p3 ONLINE 0 0 0 > ada2p3 ONLINE 0 0 0 > > errors: No known data errors > > But a little later CKSUM changed from 0 to 1: > > # zpool status > pool: zroot > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Sun Feb 14 20:44:20 2021 > 1.84T scanned at 249M/s, 275G issued at 36.5M/s, 1.97T total > 275G resilvered, 13.65% done, 0 days 13:35:00 to go > config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > ada0p3 ONLINE 0 0 0 > replacing-1 ONLINE 0 0 0 > ada1p3 ONLINE 0 0 0 > ada3p3 ONLINE 0 0 1 > ada2p3 ONLINE 0 0 0 > > errors: No known data errors > > After resilvering has finished: > > # zpool status > pool: zroot > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using 'zpool clear' or replace the device with 'zpool replace'. > see: http://illumos.org/msg/ZFS-8000-9P > scan: resilvered 1.97T in 0 days 08:25:27 with 0 errors on Mon Feb 15 05:09:47 2021 > config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > ada0p3 ONLINE 0 0 0 > ada3p3 ONLINE 0 0 1 > ada2p3 ONLINE 0 0 0 > > errors: No known data errors > > No errors reported by smartctl(8) for /dev/ada3. > > Can I consider this a harmless error and should I just run "zpool clear > ada3p3'? > > Please advise. STFW 'zpool status cksum': https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gbcve/index.html https://docs.oracle.com/cd/E19120-01/open.solaris/817-2271/gbbzs/index.html I would: 1. Check for interface, cable, and/or rack errors. These should generate error messages in dmesg(8) and /var/log/messages. I especially hate red, non-locking SATA cables without any speed marking. More than a few people agree that the red dye in the insulation will corrode copper. I have replaced all of my SATA cables with new black, locking, 6 Gbps SATA cables (made by Cable Matters). 2. If and when all of the above is okay, I would run SMART short and long tests on all three drives. (I used to believe in manufacturer diagnostic tools, but recent experiences with Seagate and Western Digital have convinced me otherwise; notably when following up with technical support. If anyone knows of a brand of HDD with good drives, good diagnostic tools, and good technical support, please advise.) 3. If and when all of the above is okay, I would scrub the pool. 4. If and when all of the above is okay and CKSUM remains (#3 might clear it?), I would do the 'zpool clear ...'. David