Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 10 Apr 2023 16:54:13 -0700
From:      Cy Schubert <Cy.Schubert@cschubert.com>
To:        Charlie Li <vishwin@freebsd.org>
Cc:        Rick Macklem <rick.macklem@gmail.com>, Martin Matuska <mm@freebsd.org>, src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org
Subject:   Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
Message-ID:  <20230410165406.51bcd958@cschubert.com>
In-Reply-To: <4e85eb84-f0cc-2f8c-d3d9-1e016ede042a@freebsd.org>
References:  <202304031513.333FD6qw014903@gitrepo.freebsd.org> <20230403231444.CF48911F@slippy.cwsent.com> <20230403232549.73E331A2@slippy.cwsent.com> <CAM5tNy45XwDNGK27i_Z_96H-sLDXXHuaZbSQ=E7507eCiCvgJw@mail.gmail.com> <20230403235851.84C0467@slippy.cwsent.com> <CAM5tNy6TMoXAKyfWq_psEjK0zy9j%2B=7yzp1vRirAfTdXBxabSQ@mail.gmail.com> <CAM5tNy64HTeC8%2BOT_SHg1osnKKAH3_qQJkyWFuOy-LDAFVzu%2BA@mail.gmail.com> <20230404052811.DA2172C1@slippy.cwsent.com> <7c75b934-cb0a-b32e-bc19-b1e15e8cf3aa@freebsd.org> <20230409154042.0685a273@cschubert.com> <ba938b23-a6d0-f673-ffc8-b3d9d59e53a4@freebsd.org> <E3DD3607-887C-48C4-9031-5204DD84E6A5@cschubert.com> <a99a20b9-c348-89f6-db37-604f72002da4@freebsd.org> <707e4671-d746-aa23-e340-6eb8f50f78c6@freebsd.org> <20230409205826.7802259d@cschubert.com> <4e85eb84-f0cc-2f8c-d3d9-1e016ede042a@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 10 Apr 2023 01:58:00 -0400
Charlie Li <vishwin@freebsd.org> wrote:

> Cy Schubert wrote:
> > Hmm, interesting. I'm experiencing no such panics nor corruption since
> > the commit.
> > 
> > Reading a previous email of yours from today block_cloning is not
> > enabled. Is it possible that before the regression was fixed, while it
> > was wreaking havoc in your zpool, that your zpool became irreversibly
> > corrupted resulting in panics, even with the fixed code?
> >   
> This is probably now the case.
> > One way, probably the best way, to test would be to revert back to the
> > commit prior to the import. If you still experience panics and
> > corruption, your zpool is damaged.
> >   
> Fails to mount with error 45 on a boot environment only a few commits 
> before the import.
> > At the moment we don't know if the code is still broken or if it has
> > been fixed but residual damage is still causing creeping rot and panics.
> > 
> > I don't know if zpool scrub can fix this -- reading one comment on
> > FreeBSD-current, zpool scrub fails to complete.
> >   
> It doesn't. All scrubs on my end complete fully with nothing to repair.
> > I'm not convinced, yet, that the problem code has not been fixed. We
> > don't know if the panics are a result of corruption as a result of the
> > regression.
> > 
> > Would it be best if we reverted the seven commits to main? I don't
> > know. I could argue it either way. My problems, on four machines, have
> > been fixed by the subsequent commits. We don't know if there are other
> > regressions or if the current problems are due to corruption caused
> > writes prior to patches addressing the regression. Maybe reverting the
> > seven commits and taking a watch for further fallout approach, whether
> > the panics and problems persist post revert. If the problems persist
> > post revert we know for sure the regression has caused some permanent
> > corruption. This is a radical option. IMO, I'm torn whether a revert
> > would be the best approach or not. It has its merits but
> > significant limitations too.
> >   
> Going to try recreating the pool on current tip, making sure that 
> block_cloning is disabled.
> 

You'll need to do this at pool creation time.

I have a "sandhbox" pool, called t, used for /usr/obj and ports wrkdirs, and other writes I can easily recreate on my laptop. Here are the results of my tests.

Method:

Initially I copied my /usr/obj from my two build machines (one amd64.amd64 and an i386.i386) to my "sandbox" zpool.

Next, with block_cloning disabled I did cp -R of the /usr/obj test files. Then a diff -qr. They source and target directories were the same.

Next, I cleaned up (rm -rf) the target directory to prepare for the 
block_clone enabled test.

Next, I did zpool checkpoint t. After this, zpool upgrade t. Pool t now has block_cloning enabled.

I repeated the cp -R test from above followed by a diff -qr. Almost 
every file was different. The pool was corrupted.

I restored the pool by the following removing the corruption:


slippy# zpool export t
slippy# zpool import --rewind-to-checkpoint t
slippy#

It is recommended that people avoid upgrading their zpools until the 
problem is fixed.


-- 
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
NTP:           <cy@nwtime.org>    Web:  https://nwtime.org

			e^(i*pi)+1=0



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20230410165406.51bcd958>