Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Apr 2023 22:30:13 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        FreeBSD User <freebsd@walstatt-de.de>
Cc:        Cy Schubert <Cy.Schubert@cschubert.com>, Mark Millard <marklmi@yahoo.com>,  Charlie Li <vishwin@freebsd.org>, Pawel Jakub Dawidek <pjd@freebsd.org>, dev-commits-src-main@freebsd.org,  Current FreeBSD <freebsd-current@freebsd.org>
Subject:   Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
Message-ID:  <CAGudoHHnimmsgXTBjrcY=FiYnQCoh7m8zhBM4BPYHoFy%2BihUxQ@mail.gmail.com>
In-Reply-To: <20230415175218.777d0a97@thor.intern.walstatt.dynvpn.de>
References:  <20230413071032.18BFF31F@slippy.cwsent.com> <D0D9BD06-C321-454C-A038-C55C63E0DD6B@dawidek.net> <20230413063321.60344b1f@cschubert.com> <CAGudoHG3rCx93gyJTmzTBnSe4fQ9=m4mBESWbKVWtAGRxen_4w@mail.gmail.com> <20230413135635.6B62F354@slippy.cwsent.com> <c41f9ed6-e557-9255-5a46-1a22d4b32d66@dawidek.net> <319a267e-3f76-3647-954a-02178c260cea@dawidek.net> <b60807e9-f393-6e6d-3336-042652ddd03c@freebsd.org> <441db213-2abb-b37e-e5b3-481ed3e00f96@dawidek.net> <5ce72375-90db-6d30-9f3b-a741c320b1bf@freebsd.org> <99382FF7-765C-455F-A082-C47DB4D5E2C1@yahoo.com> <32cad878-726c-4562-0971-20d5049c28ad@freebsd.org> <ABC9F3DB-289E-455E-AF43-B3C13525CB2C@yahoo.com> <20230415115452.08911bb7@thor.intern.walstatt.dynvpn.de> <20230415143625.99388387@slippy.cwsent.com> <20230415175218.777d0a97@thor.intern.walstatt.dynvpn.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On 4/15/23, FreeBSD User <freebsd@walstatt-de.de> wrote:
> Am Sat, 15 Apr 2023 07:36:25 -0700
> Cy Schubert <Cy.Schubert@cschubert.com> schrieb:
>
>> In message <20230415115452.08911bb7@thor.intern.walstatt.dynvpn.de>,
>> FreeBSD Us
>> er writes:
>> > Am Thu, 13 Apr 2023 22:18:04 -0700
>> > Mark Millard <marklmi@yahoo.com> schrieb:
>> >
>> > > On Apr 13, 2023, at 21:44, Charlie Li <vishwin@freebsd.org> wrote:
>> > >
>> > > > Mark Millard wrote:
>> > > >> FYI: in my original report for a context that has never had
>> > > >> block_cloning enabled, I reported BOTH missing files and
>> > > >> file content corruption in the poudriere-devel bulk build
>> > > >> testing. This predates:
>> > > >> https://people.freebsd.org/~pjd/patches/brt_revert.patch
>> > > >> but had the changes from:
>> > > >> https://github.com/openzfs/zfs/pull/14739/files
>> > > >> The files were missing from packages installed to be used
>> > > >> during a port's build. No other types of examples of missing
>> > > >> files happened. (But only 11 ports failed.)
>> > > > I also don't have block_cloning enabled. "Missing files" prior to
>> > > > brt_rev
>> > ert may actually
>> > > > be present, but as the corruption also messes with the file(1)
>> > > > signature,
>> >  some tools like
>> > > > ldconfig report them as missing.
>> > >
>> > > For reference, the specific messages that were not explicit
>> > > null-byte complaints were (some shown with a little context):
>> > >
>> > >
>> > > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - not
>> > > found
>> > > ===>   Installing existing package /packages/All/libxml2-2.10.3_1.pkg
>> > >
>> > > [CA72_ZFS] Installing libxml2-2.10.3_1...
>> > > [CA72_ZFS] Extracting libxml2-2.10.3_1: .......... done
>> > > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - found
>> > >
>> > > (/usr/local/lib/libxml2.so) . . .
>> > > [CA72_ZFS] Extracting libxslt-1.1.37: .......... done
>> > > ===>   py39-lxml-4.9.2 depends on shared library: libxslt.so - found
>> > >
>> > > (/usr/local/lib/libxslt.so) ===>   Returning to build of
>> > > py39-lxml-4.9.2
>> > > . . .
>> > > ===>  Configuring for py39-lxml-4.9.2
>> > > Building lxml version 4.9.2.
>> > > Building with Cython 0.29.33.
>> > > Error: Please make sure the libxml2 and libxslt development packages
>> > > are in
>> > stalled.
>> > >
>> > >
>> > > [CA72_ZFS] Extracting libunistring-1.1: .......... done
>> > > ===>   libidn2-2.3.4 depends on shared library: libunistring.so - not
>> > > found
>> >
>> > >
>> > >
>> > > [CA72_ZFS] Extracting gmp-6.2.1: .......... done
>> > > ===>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not found
>> > >
>> > >
>> > >
>> > > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
>> > > ===>   Installing existing package /packages/All/gmp-6.2.1.pkg
>> > > [CA72_ZFS] Installing gmp-6.2.1...
>> > > the most recent version of gmp-6.2.1 is already installed
>> > > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
>> > >
>> > > *** Error code 1
>> > >
>> > >
>> > > autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
>> > >
>> > >
>> > > checking for GNU
>> > > M4 that supports accurate traces... configure: error: no acceptable m4
>> > > coul
>> > d be found in
>> > > $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is
>> > > recommended.
>> > > GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
>> > > Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
>> > >
>> > >
>> > > ld: error: /usr/local/lib/libblkid.a: unknown file type
>> > >
>> > >
>> > > ===
>> > > Mark Millard
>> > > marklmi at yahoo.com
>> > >
>> > >
>> >
>> > Hello
>> >
>> > whar is the recent status of fixing/mitigate this desatrous bug?
>> > Especially f
>> > or those with the
>> > new option enabled on ZFS pools. Any advice?
>> >
>> > In an act of precausion (or call it panic) I shutdown several servers to
>> > prev
>> > ent irreversible
>> > damages to databases and data storages. We face on one host with
>> > /usr/ports r
>> > esiding on ZFS
>> > always errors on the same files created while staging (using portmaster,
>> > leav
>> > es the system
>> > with noninstalled software, i.e. www/apache24 in our case). Deleting the
>> > work
>> >  folder doesn't
>> > seem to change anything, even when starting a scrubbing of the entire
>> > pool (R
>> > AIDZ1 pool) -
>> > cause unknown, why it affects always the same files to be corrupted.
>> > Same wit
>> > h deve/ruby-gems.
>> >
>> > Poudriere has been shutdown for the time being to avoid further issues.
>> >
>> >
>> > Are there any advies to proceed apart from conserving the boxes via
>> > shutdown?
>> >
>> > Thank you ;-)
>> > oh
>> >
>> >
>> >
>> > --
>> > O. Hartmann
>>
>> With an up-to-date tree + pjd@'s "Fix data corruption when cloning
>> embedded
>> blocks. #14739" patch I didn't have any issues, except for email messages
>>
>> with corruption in my sent directory, nowhere else. I'm still
>> investigating
>> the email messages issue. IMO one is generally safe to run poudriere on
>> the
>> latest ZFS with the additional patch.
>>
>> My tests of the additional patch concluded that it resolved my last
>> problems, except for the sent email problem I'm still investigating. I'm
>> sure there's a simple explanation for it, i.e. the email thread was
>> corrupted by the EXDEV regression which cannot be fixed by anything, even
>>
>> reverting to the previous ZFS -- the data in those files will remain
>> damaged regardless.
>>
>> I cannot speak to the others who have had poudriere and other issues. I
>> never had any problems with poudriere on top of the new ZFS.
>>
>> WRT reverting block_cloning pools to without, your only option is to
>> backup
>> your pool and recreate it without block_cloning. Then restore your data.
>>
>>
>
> All right, I interpret the answer that way, that I need a most recent source
> tree (and
> accordingly built and installed OS) AND a patch that isn't officially
> commited?
>
> On a box I'm with:
>
> FreeBSD 14.0-CURRENT #8 main-n262175-5ee1c90e50ce: Sat Apr 15 07:57:16 CEST
> 2023 amd64
>
> The box is crashing while trying to update ports with the well known issue:
>
> Panic String: VERIFY(!zil_replaying(zilog, tx)) failed
>
> At the moment all alarm bells are ringing and I lost track about what has
> been patched and
> already commited and what is still as patch available but in the phase of
> evaluation or
> inofficially emmited here.
>
> According to the EXDEV issue: in cases of poudriere or ports trees on ZFS,
> what do I have to
> do to ensure that those datasets are clean? The OS should detect file
> corruption but in my
> case the box is crashing :-(
>
> I did several times scrubbing, but this seems to be the action of a helpless
> and desperate man
> ... ;-/
>
> Greetings
>

Using block cloning is still not safe, but somewhere in this thread
pjd had a patch to keep it operatinal for already cloned files without
adding new ones.

Anyhow, as was indicated by vishwin@ there was data corruption
*unrelated* to block cloning which also came with the import, I
narrowed it down: https://github.com/openzfs/zfs/issues/14753

That said now I'm testing a kernel which does not do block cloning and
does not have the other problematic commit, we will see if things
work.

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHHnimmsgXTBjrcY=FiYnQCoh7m8zhBM4BPYHoFy%2BihUxQ>