From nobody Mon Apr 10 23:54:13 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PwQmn6n3bz44y7W; Mon, 10 Apr 2023 23:54:17 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Received: from omta002.cacentral1.a.cloudfilter.net (omta002.cacentral1.a.cloudfilter.net [3.97.99.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "Client", Issuer "CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PwQmm2wT9z4Htd; Mon, 10 Apr 2023 23:54:16 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Authentication-Results: mx1.freebsd.org; none Received: from shw-obgw-4002a.ext.cloudfilter.net ([10.228.9.250]) by cmsmtp with ESMTP id lwEUp548Kjvm1m1Kxp21BE; Mon, 10 Apr 2023 23:54:15 +0000 Received: from spqr.komquats.com ([70.66.148.124]) by cmsmtp with ESMTPA id m1KvpVNAqyAOem1KwpWzWO; Mon, 10 Apr 2023 23:54:15 +0000 X-Authority-Analysis: v=2.4 cv=e5oV9Il/ c=1 sm=1 tr=0 ts=6434a1a7 a=Cwc3rblV8FOMdVN/wOAqyQ==:117 a=Cwc3rblV8FOMdVN/wOAqyQ==:17 a=xqWC_Br6kY4A:10 a=kj9zAlcOel0A:10 a=dKHAf1wccvYA:10 a=6I5d2MoRAAAA:8 a=YxBL1-UpAAAA:8 a=EkcXrb_YAAAA:8 a=MBLj5XERK8VBRhsjEw8A:9 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 a=Ia-lj3WSrqcvXOmTRaiG:22 a=LK5xJRSDVpKd5WXXoEvA:22 Received: from slippy.cwsent.com (slippy [10.1.1.91]) by spqr.komquats.com (Postfix) with ESMTP id 9278EB08; Mon, 10 Apr 2023 16:54:13 -0700 (PDT) Received: from localhost (localhost [IPv6:::1]) by slippy.cwsent.com (Postfix) with ESMTP id 78E341CE; Mon, 10 Apr 2023 16:54:13 -0700 (PDT) Date: Mon, 10 Apr 2023 16:54:13 -0700 From: Cy Schubert To: Charlie Li Cc: Rick Macklem , Martin Matuska , src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 Message-ID: <20230410165406.51bcd958@cschubert.com> In-Reply-To: <4e85eb84-f0cc-2f8c-d3d9-1e016ede042a@freebsd.org> References: <202304031513.333FD6qw014903@gitrepo.freebsd.org> <20230403231444.CF48911F@slippy.cwsent.com> <20230403232549.73E331A2@slippy.cwsent.com> <20230403235851.84C0467@slippy.cwsent.com> <20230404052811.DA2172C1@slippy.cwsent.com> <7c75b934-cb0a-b32e-bc19-b1e15e8cf3aa@freebsd.org> <20230409154042.0685a273@cschubert.com> <707e4671-d746-aa23-e340-6eb8f50f78c6@freebsd.org> <20230409205826.7802259d@cschubert.com> <4e85eb84-f0cc-2f8c-d3d9-1e016ede042a@freebsd.org> Organization: KOMQUATS X-Mailer: Claws Mail 3.19.0 (GTK+ 2.24.33; amd64-portbld-freebsd14.0) List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4xfMJz2v9QSXlw7UgfsivpxbGLFEyVhjxDY3I9ck3msop+TYOPRlfXZRVPF8Yy/yV1IylE77SmQMtZp1D7B964iAy/iUEygz1e/n42dSXqoXxjTm2uIWuK SL1rsi9V0V+y7mn7hs5Bxt0YnURsqGpWfy/6m6dadD+wvBJ5gzIpAMNAfSoP7wKSu0vYsMxfxKVLD764fjPGVFR1jnsdR8qrz/N96pcdor3ErzUViEmuDV0r A0B/sixg5z46ZJ8zwQPzk6V2A8ImuuA+i02CUEArHI9jR1Rjh0d80vWH92unVzGExfY/xPHY+x16ZLsgvuu9jLkFI7OPb83ZQ5FJGA/BJyFDnkF1Rz8AlC7z dvjx7V5XTjNO5VdkfcaSzcZyaPNEWuuL5mDOZIZK+xpFyJ9jtVY= X-Rspamd-Queue-Id: 4PwQmm2wT9z4Htd X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_RCPT(0.00)[]; ASN(0.00)[asn:16509, ipnet:3.96.0.0/15, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On Mon, 10 Apr 2023 01:58:00 -0400 Charlie Li wrote: > Cy Schubert wrote: > > Hmm, interesting. I'm experiencing no such panics nor corruption since > > the commit. > > > > Reading a previous email of yours from today block_cloning is not > > enabled. Is it possible that before the regression was fixed, while it > > was wreaking havoc in your zpool, that your zpool became irreversibly > > corrupted resulting in panics, even with the fixed code? > > > This is probably now the case. > > One way, probably the best way, to test would be to revert back to the > > commit prior to the import. If you still experience panics and > > corruption, your zpool is damaged. > > > Fails to mount with error 45 on a boot environment only a few commits > before the import. > > At the moment we don't know if the code is still broken or if it has > > been fixed but residual damage is still causing creeping rot and panics. > > > > I don't know if zpool scrub can fix this -- reading one comment on > > FreeBSD-current, zpool scrub fails to complete. > > > It doesn't. All scrubs on my end complete fully with nothing to repair. > > I'm not convinced, yet, that the problem code has not been fixed. We > > don't know if the panics are a result of corruption as a result of the > > regression. > > > > Would it be best if we reverted the seven commits to main? I don't > > know. I could argue it either way. My problems, on four machines, have > > been fixed by the subsequent commits. We don't know if there are other > > regressions or if the current problems are due to corruption caused > > writes prior to patches addressing the regression. Maybe reverting the > > seven commits and taking a watch for further fallout approach, whether > > the panics and problems persist post revert. If the problems persist > > post revert we know for sure the regression has caused some permanent > > corruption. This is a radical option. IMO, I'm torn whether a revert > > would be the best approach or not. It has its merits but > > significant limitations too. > > > Going to try recreating the pool on current tip, making sure that > block_cloning is disabled. > You'll need to do this at pool creation time. I have a "sandhbox" pool, called t, used for /usr/obj and ports wrkdirs, and other writes I can easily recreate on my laptop. Here are the results of my tests. Method: Initially I copied my /usr/obj from my two build machines (one amd64.amd64 and an i386.i386) to my "sandbox" zpool. Next, with block_cloning disabled I did cp -R of the /usr/obj test files. Then a diff -qr. They source and target directories were the same. Next, I cleaned up (rm -rf) the target directory to prepare for the block_clone enabled test. Next, I did zpool checkpoint t. After this, zpool upgrade t. Pool t now has block_cloning enabled. I repeated the cp -R test from above followed by a diff -qr. Almost every file was different. The pool was corrupted. I restored the pool by the following removing the corruption: slippy# zpool export t slippy# zpool import --rewind-to-checkpoint t slippy# It is recommended that people avoid upgrading their zpools until the problem is fixed. -- Cheers, Cy Schubert FreeBSD UNIX: Web: https://FreeBSD.org NTP: Web: https://nwtime.org e^(i*pi)+1=0