Date: Mon, 10 Apr 2023 16:54:13 -0700 From: Cy Schubert <Cy.Schubert@cschubert.com> To: Charlie Li <vishwin@freebsd.org> Cc: Rick Macklem <rick.macklem@gmail.com>, Martin Matuska <mm@freebsd.org>, src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 Message-ID: <20230410165406.51bcd958@cschubert.com> In-Reply-To: <4e85eb84-f0cc-2f8c-d3d9-1e016ede042a@freebsd.org> References: <202304031513.333FD6qw014903@gitrepo.freebsd.org> <20230403231444.CF48911F@slippy.cwsent.com> <20230403232549.73E331A2@slippy.cwsent.com> <CAM5tNy45XwDNGK27i_Z_96H-sLDXXHuaZbSQ=E7507eCiCvgJw@mail.gmail.com> <20230403235851.84C0467@slippy.cwsent.com> <CAM5tNy6TMoXAKyfWq_psEjK0zy9j%2B=7yzp1vRirAfTdXBxabSQ@mail.gmail.com> <CAM5tNy64HTeC8%2BOT_SHg1osnKKAH3_qQJkyWFuOy-LDAFVzu%2BA@mail.gmail.com> <20230404052811.DA2172C1@slippy.cwsent.com> <7c75b934-cb0a-b32e-bc19-b1e15e8cf3aa@freebsd.org> <20230409154042.0685a273@cschubert.com> <ba938b23-a6d0-f673-ffc8-b3d9d59e53a4@freebsd.org> <E3DD3607-887C-48C4-9031-5204DD84E6A5@cschubert.com> <a99a20b9-c348-89f6-db37-604f72002da4@freebsd.org> <707e4671-d746-aa23-e340-6eb8f50f78c6@freebsd.org> <20230409205826.7802259d@cschubert.com> <4e85eb84-f0cc-2f8c-d3d9-1e016ede042a@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 10 Apr 2023 01:58:00 -0400 Charlie Li <vishwin@freebsd.org> wrote: > Cy Schubert wrote: > > Hmm, interesting. I'm experiencing no such panics nor corruption since > > the commit. > > > > Reading a previous email of yours from today block_cloning is not > > enabled. Is it possible that before the regression was fixed, while it > > was wreaking havoc in your zpool, that your zpool became irreversibly > > corrupted resulting in panics, even with the fixed code? > > > This is probably now the case. > > One way, probably the best way, to test would be to revert back to the > > commit prior to the import. If you still experience panics and > > corruption, your zpool is damaged. > > > Fails to mount with error 45 on a boot environment only a few commits > before the import. > > At the moment we don't know if the code is still broken or if it has > > been fixed but residual damage is still causing creeping rot and panics. > > > > I don't know if zpool scrub can fix this -- reading one comment on > > FreeBSD-current, zpool scrub fails to complete. > > > It doesn't. All scrubs on my end complete fully with nothing to repair. > > I'm not convinced, yet, that the problem code has not been fixed. We > > don't know if the panics are a result of corruption as a result of the > > regression. > > > > Would it be best if we reverted the seven commits to main? I don't > > know. I could argue it either way. My problems, on four machines, have > > been fixed by the subsequent commits. We don't know if there are other > > regressions or if the current problems are due to corruption caused > > writes prior to patches addressing the regression. Maybe reverting the > > seven commits and taking a watch for further fallout approach, whether > > the panics and problems persist post revert. If the problems persist > > post revert we know for sure the regression has caused some permanent > > corruption. This is a radical option. IMO, I'm torn whether a revert > > would be the best approach or not. It has its merits but > > significant limitations too. > > > Going to try recreating the pool on current tip, making sure that > block_cloning is disabled. > You'll need to do this at pool creation time. I have a "sandhbox" pool, called t, used for /usr/obj and ports wrkdirs, and other writes I can easily recreate on my laptop. Here are the results of my tests. Method: Initially I copied my /usr/obj from my two build machines (one amd64.amd64 and an i386.i386) to my "sandbox" zpool. Next, with block_cloning disabled I did cp -R of the /usr/obj test files. Then a diff -qr. They source and target directories were the same. Next, I cleaned up (rm -rf) the target directory to prepare for the block_clone enabled test. Next, I did zpool checkpoint t. After this, zpool upgrade t. Pool t now has block_cloning enabled. I repeated the cp -R test from above followed by a diff -qr. Almost every file was different. The pool was corrupted. I restored the pool by the following removing the corruption: slippy# zpool export t slippy# zpool import --rewind-to-checkpoint t slippy# It is recommended that people avoid upgrading their zpools until the problem is fixed. -- Cheers, Cy Schubert <Cy.Schubert@cschubert.com> FreeBSD UNIX: <cy@FreeBSD.org> Web: https://FreeBSD.org NTP: <cy@nwtime.org> Web: https://nwtime.org e^(i*pi)+1=0
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20230410165406.51bcd958>