From nobody Thu Apr 13 07:10:32 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PxrMH3sRmz45chF; Thu, 13 Apr 2023 07:10:35 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Received: from omta002.cacentral1.a.cloudfilter.net (omta002.cacentral1.a.cloudfilter.net [3.97.99.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "Client", Issuer "CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PxrMH2DSYz4QG0; Thu, 13 Apr 2023 07:10:35 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Authentication-Results: mx1.freebsd.org; none Received: from shw-obgw-4001a.ext.cloudfilter.net ([10.228.9.142]) by cmsmtp with ESMTP id mnQSp9Bhrjvm1mr6IpA4W1; Thu, 13 Apr 2023 07:10:34 +0000 Received: from spqr.komquats.com ([70.66.148.124]) by cmsmtp with ESMTPA id mr6GpJQi4HFsOmr6HpTKil; Thu, 13 Apr 2023 07:10:34 +0000 X-Authority-Analysis: v=2.4 cv=XZqaca15 c=1 sm=1 tr=0 ts=6437aaea a=Cwc3rblV8FOMdVN/wOAqyQ==:117 a=Cwc3rblV8FOMdVN/wOAqyQ==:17 a=8nJEP1OIZ-IA:10 a=dKHAf1wccvYA:10 a=VxmjJ2MpAAAA:8 a=CjxXgO3LAAAA:8 a=YxBL1-UpAAAA:8 a=kDZLfgLDAAAA:8 a=NEAV23lmAAAA:8 a=6I5d2MoRAAAA:8 a=EkcXrb_YAAAA:8 a=1EHfynvRDzDv6SEXHlQA:9 a=wPNLvfGTeEIA:10 a=tCI1PRuhg74A:10 a=LyydU4Oes_UA:10 a=7gXAzLPJhVmCkEl4_tsf:22 a=Ia-lj3WSrqcvXOmTRaiG:22 a=Aez1uqWRNYMWVBb44gMB:22 a=IjZwj45LgO3ly-622nXo:22 a=LK5xJRSDVpKd5WXXoEvA:22 Received: from slippy.cwsent.com (slippy [10.1.1.91]) by spqr.komquats.com (Postfix) with ESMTP id 2F06BAA8; Thu, 13 Apr 2023 00:10:32 -0700 (PDT) Received: by slippy.cwsent.com (Postfix, from userid 1000) id 18BFF31F; Thu, 13 Apr 2023 00:10:32 -0700 (PDT) X-Mailer: exmh version 2.9.0 11/07/2018 with nmh-1.8+dev Reply-to: Cy Schubert From: Cy Schubert X-os: FreeBSD X-Sender: cy@cwsent.com X-URL: http://www.cschubert.com/ To: Cy Schubert cc: Mark Millard , Mateusz Guzik , vishwin@freebsd.org, dev-commits-src-main@freebsd.org, Current FreeBSD , pawel@dawidek.net, pjd@freebsd.org Subject: Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 In-reply-to: <20230413070426.8A54F25A@slippy.cwsent.com> References: <20230413055221.E8B211F0@slippy.cwsent.com> <20230413064252.1E5C1318@slippy.cwsent.com> <20230413070426.8A54F25A@slippy.cwsent.com> Comments: In-reply-to Cy Schubert message dated "Thu, 13 Apr 2023 00:04:26 -0700." List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Date: Thu, 13 Apr 2023 00:10:32 -0700 Message-Id: <20230413071032.18BFF31F@slippy.cwsent.com> X-CMAE-Envelope: MS4xfAN0PUpmcKlBumtDvar9Xq23zko3pOE7tmlMv3+AX23f8VBbUUKzTc2xxxyh9MoNLNjdzmTmWb0zkktqKsfV79vJHwtIka0Rbme3xVTNHUGky8Xt7zzA 5OHFc2dEIKmKdSXGcGyQhBtBy+IxMPe75cFBtgGmLZ0B71xZjUSYAont7dIh1D2lSAyqywYcFFcgABX3RBgleX3LxjqwrPqak6ck+cMIMFm2wfxADJ7uUBK5 De9ijGEgtoUQJN6QlFadz6x85NN2dMXONR+T/pZFfTkVN6jqu7kUPwRXP/sV3s7608HzpUPRFj/XFdUZS2uDp3MUSugDq/8KVPDbTJ8J5ppv4ouqfZSPPUdw A4pBGXWCAEiaIWIf/u5xIgiMP8uNoOT8UH82BeHKp9efhxJkRdY= X-Rspamd-Queue-Id: 4PxrMH2DSYz4QG0 X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:16509, ipnet:3.96.0.0/15, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N In message <20230413070426.8A54F25A@slippy.cwsent.com>, Cy Schubert writes: > In message <20230413064252.1E5C1318@slippy.cwsent.com>, Cy Schubert writes: > > In message , Mark Millard > > write > > s: > > > [This just puts my prior reply's material into Cy's > > > adjusted resend of the original. The To/Cc should > > > be coomplete this time.] > > > > > > On Apr 12, 2023, at 22:52, Cy Schubert = > > > wrote: > > > > > > > In message , Mark = > > > Millard=20 > > > > write > > > > s: > > > >> From: Charlie Li wrote on > > > >> Date: Wed, 12 Apr 2023 20:11:16 UTC : > > > >>=20 > > > >>> Charlie Li wrote: > > > >>>> Mateusz Guzik wrote: > > > >>>>> can you please test poudriere with > > > >>>>> https://github.com/openzfs/zfs/pull/14739/files > > > >>>>>=20 > > > >>>> After applying, on the md(4)-backed pool regardless of =3D > > > >> block_cloning,=3D20 > > > >>>> the cy@ `cp -R` test reports no differing (ie corrupted) files. = > > > Will=3D20=3D > > > >>=20 > > > >>>> report back on poudriere results (no block_cloning). > > > >>>> =3D20 > > > >>> As for poudriere, build failures are still rolling in. These are = > > > (and=3D20=3D > > > >>=20 > > > >>> have been) entirely random on every run. Some examples from this = > > > run: > > > >>> =3D20 > > > >>> lang/php81: > > > >>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D20 > > > >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D > > > >> ${STAGEDIR}/${PREFIX}/etc > > > >>> - consumers fail to build due to corrupted php.conf packaged > > > >>> =3D20 > > > >>> devel/ninja: > > > >>> - phase: stage > > > >>> - install -s -m 555=3D20 > > > >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20 > > > >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin > > > >>> - consumers fail to build due to corrupted bin/ninja packaged > > > >>> =3D20 > > > >>> devel/netsurf-buildsystem: > > > >>> - phase: stage > > > >>> - mkdir -p=3D20 > > > >>> =3D > > > >> = > > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n > e= > > > =3D > > > >> tsurf-buildsystem/makefiles=3D20 > > > >>> =3D > > > >> = > > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n > e= > > > =3D > > > >> tsurf-buildsystem/testtools > > > >>> for M in Makefile.top Makefile.tools Makefile.subdir =3D > > > >> Makefile.pkgconfig=3D20 > > > >>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \ > > > >>> cp makefiles/$M=3D20 > > > >>> =3D > > > >> = > > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n > e= > > > =3D > > > >> tsurf-buildsystem/makefiles/;=3D20 > > > >>> \ > > > >>> done > > > >>> - graphics/libnsgif fails to build due to NUL characters in=3D20 > > > >>> Makefile.{clang,subdir}, causing nothing to link > > > >>=20 > > > >> Summary: I have problems building ports into packages > > > >> via poudriere-devel use despite being fully updated/patched > > > >> (as of when I started the experiment), never having enabled > > > >> block_cloning ( still using openzfs-2.1-freebsd ). > > > >>=20 > > > >> In other words, I can confirm other reports that have > > > >> been made. > > > >>=20 > > > >> The details follow. > > > >>=20 > > > >>=20 > > > >> [Written as I was working on setting up for the experiments > > > >> and then executing those experiments, adjusting as I went > > > >> along.] > > > >>=20 > > > >> I've run my own tests in a context that has never had the > > > >> zpool upgrade and that jump from before the openzfs import to > > > >> after the existing commits for trying to fix openzfs on > > > >> FreeBSD. I report on the sequence of activities getting to > > > >> the point of testing as well. > > > >>=20 > > > >> By personal policy I keep my (non-temporary) pool's compatible > > > >> with what the most recent ??.?-RELEASE supports, using > > > >> openzfs-2.1-freebsd for now. The pools involved below have > > > >> never had a zpool upgrade from where they started. (I've no > > > >> pools that have ever had a zpool upgrade.) > > > >>=20 > > > >> (Temporary pools are rare for me, such as this investigation. > > > >> But I'm not testing block_cloning or anything new this time.) > > > >>=20 > > > >> I'll note that I use zfs for bectl, not for redundancy. So > > > >> my evidence is more limited in that respect. > > > >>=20 > > > >> The activities were done on a HoneyComb (16 Cortex-A72 cores). > > > >> The system has and supports ECC RAM, 64 GiBytes of RAM are > > > >> present. > > > >>=20 > > > >> I started by duplicating my normal zfs environment to an > > > >> external USB3 NVMe drive and adjusting the host name and such > > > >> to produce the below. (Non-debug, although I do not strip > > > >> symbols.) : > > > >>=20 > > > >> # uname -apKU > > > >> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 =3D > > > >> main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 =3D > > > >> = > > > root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm > 6= > > > =3D > > > >> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082 > > > >>=20 > > > >> I then did: git fetch, stash push ., merge --ff-only, stash apply . : > > > >> my normal procedure. I then also applied the patch from: > > > >>=20 > > > >> https://github.com/openzfs/zfs/pull/14739/files > > > >>=20 > > > >> Then I did: buildworld buildkernel, install them, and rebooted. > > > >>=20 > > > >> The result was: > > > >>=20 > > > >> # uname -apKU > > > >> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 =3D > > > >> main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023 =3D > > > >> = > > > root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm > 6= > > > =3D > > > >> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400086 1400086 > > > >>=20 > > > >> The later poudriere-devel based build of packages from ports is > > > >> based on: > > > >>=20 > > > >> # ~/fbsd-based-on-what-commit.sh -C /usr/ports > > > >> 4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) =3D > > > >> devel/freebsd-gcc12: Bump to 12.2.0. > > > >> Author: John Baldwin > > > >> Commit: John Baldwin > > > >> CommitDate: 2023-03-25 00:06:40 +0000 > > > >> branch: main > > > >> merge-base: 4e94ac9eb97fab16510b74ebcaa9316613182a72 > > > >> merge-base: CommitDate: 2023-03-25 00:06:40 +0000 > > > >> n613214 (--first-parent --count for merge-base) > > > >>=20 > > > >> poudriere attempted to build 476 packages, starting > > > >> with pkg (in order to build the 56 that I explicitly > > > >> indicate that I want). It is my normal set of ports. > > > >> The form of building is biased to allowing a high > > > >> load average compared to the number of hardware > > > >> threads (same as cores here): each builder is allowed > > > >> to use the full count of hardware threads. The build >> > > >> normally use on the build machine involved. > > > >>=20 > > > >> And it produced some random errors during the attempted > > > >> builds. A type of example that is easy to interpret > > > >> without further exploration is: > > > >>=20 > > > >> pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse > = > > > =3D > > > >> error at "'\x00\x00\x00\x00\x00\x00\x00\x00'": Expected W:(0-9A-Za-z) > 0 > > > >> da0p8 ONLINE 0 0 0 > > > >>=20 > > > >> errors: No known data errors > > > >>=20 > > > >>=20 > > > >> =3D3D=3D3D=3D3D > > > >> Mark Millard > > > >> marklmi at yahoo.com > > > >=20 > > > >=20 > > > > Let's try this again. Claws-mail didn't include the list address in = > > > the=20 > > > > header. Trying to reply, again, using exmh instead. > > > >=20 > > > >=20 > > > > Did your pools suffer the EXDEV problem? The EXDEV also corrupted = > > > files. > > > > > > As I reported, this was a jump from before the import > > > to as things are tonight (here). So: NO, unless the > > > existing code as of tonight still has the EXDEV problem! > > > > > > Prior to this experiment I'd not progressed any media > > > beyond: main-n261544-cee09bda03c8-dirty Wed Mar 15 20:25:49. > > > > > > > I think, without sufficient investigation we risk jumping to > > > > conclusions. I've taken an extremely cautious approach, rolling back > > > > snapshots (as much as possible, i.e. poudriere datasets) when EXDEV > > > > corruption was encountered. > > > > > > Again: nothing between main-n261544-cee09bda03c8-dirty and > > > main-n262122-2ef2c26f3f13-dirty was involved at any stage. > > > > > > >=20 > > > > I did not rollback any snapshots in my MH mail directory. Rolling back > > > > snapshots of my MH maildir would result in loss of email. I have to > > > > live with that corruption. Corrupted files in my outgoing sent email > > > > directory remain: > > > >=20 > > > > slippy$ ugrep -cPa '\x00' ~/.Mail/note | grep -c :1=20 > > > > 53 > > > > slippy$=20 > > > >=20 > > > > There are 53 corrupted files in my note log of 9913 emails. Those = > > > files > > > > will never be fixed. They were corrupted by the EXDEV bug. Any new ZFS > > > > or ZFS patches cannot retroactively remove the corruption from those > > > > files. > > > >=20 > > > > But my poudriere files, because the snapshots were rolled back, were > > > > "repaired" by the rolled back snapshots. > > > >=20 > > > > I'm not convinced that there is presently active corruption since > > > > the problem has been fixed. I am convinced that whatever corruption > > > > that was written at the time will remain forever or until those files > > > > are deleted or replaced -- just like my email files written to disk at > > > > the time. > > > > > > My test results and procedure just do not fit your conclusion > > > that things are okay now if block_clonging is completely avoided. > > > > Admitting I'm wrong: sending copies of my last reply to you back to myself, > > > again and again, three times, I've managed to reproduce the corruption you > > are talking about. > > This email itself was also corrupted. Below is what was sent. Good thing > multiple copies are saved by exmh. > > Admitting I'm wrong: sending copies of my last reply to you back to myself, > again and again, three times, I've managed to reproduce the corruption you > are talking about. This email itself was also corrupted. Below is what was sent. Good thing multiple copies are saved by exmh. Admitting I'm wrong: sending copies of my last reply to you back to myself, again and again, three times, I've managed to reproduce the corruption you are talking about. >From my previous email to you. header. Trying to reply:::::::::, again, using exmh instead. ^^^^^^^^^ Here it is, nine additional bytes of garbage. I've replaced the garbage with colons because nulls mess up a lot of things, including cut&paste. In another instance about 500 bytes were removed. I can reproduce the corruption at will now. The EXDEV patch is applied. Block_cloning is disabled. Somehow nulls and other garbage are inserted in the middle of emails after the ZFS upgrade. -- Cheers, Cy Schubert FreeBSD UNIX: Web: https://FreeBSD.org NTP: Web: https://nwtime.org e^(i*pi)+1=0