Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Apr 2023 10:42:57 +0100
From:      Danilo Egea Gondolfo <danilo@FreeBSD.org>
To:        freebsd-current@freebsd.org
Subject:   Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
Message-ID:  <fa02bf62-f378-3c9c-1ed9-b22d6c1e219a@FreeBSD.org>
In-Reply-To: <C8E4A43B-9FC8-456E-ADB3-13E7F40B2B04@yahoo.com>
References:  <C8E4A43B-9FC8-456E-ADB3-13E7F40B2B04.ref@yahoo.com> <C8E4A43B-9FC8-456E-ADB3-13E7F40B2B04@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 13/04/2023 06:28, Mark Millard wrote:
> From: Charlie Li <vishwin_at_freebsd.org> wrote on
> Date: Wed, 12 Apr 2023 20:11:16 UTC :
>
>> Charlie Li wrote:
>>> Mateusz Guzik wrote:
>>>> can you please test poudriere with
>>>> https://github.com/openzfs/zfs/pull/14739/files
>>>>
>>> After applying, on the md(4)-backed pool regardless of block_cloning,
>>> the cy@ `cp -R` test reports no differing (ie corrupted) files. Will
>>> report back on poudriere results (no block_cloning).
>>>
>> As for poudriere, build failures are still rolling in. These are (and
>> have been) entirely random on every run. Some examples from this run:
>>
>> lang/php81:
>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development
>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf ${STAGEDIR}/${PREFIX}/etc
>> - consumers fail to build due to corrupted php.conf packaged
>>
>> devel/ninja:
>> - phase: stage
>> - install -s -m 555
>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja
>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
>> - consumers fail to build due to corrupted bin/ninja packaged
>>
>> devel/netsurf-buildsystem:
>> - phase: stage
>> - mkdir -p
>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/makefiles
>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/testtools
>> for M in Makefile.top Makefile.tools Makefile.subdir Makefile.pkgconfig
>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
>> cp makefiles/$M
>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/makefiles/;
>> \
>> done
>> - graphics/libnsgif fails to build due to NUL characters in
>> Makefile.{clang,subdir}, causing nothing to link
> Summary: I have problems building ports into packages
> via poudriere-devel use despite being fully updated/patched
> (as of when I started the experiment), never having enabled
> block_cloning ( still using openzfs-2.1-freebsd ).
>
> In other words, I can confirm other reports that have
> been made.
>
> The details follow.
>
>
> [Written as I was working on setting up for the experiments
> and then executing those experiments, adjusting as I went
> along.]
>
> I've run my own tests in a context that has never had the
> zpool upgrade and that jump from before the openzfs import to
> after the existing commits for trying to fix openzfs on
> FreeBSD. I report on the sequence of activities getting to
> the point of testing as well.
>
> By personal policy I keep my (non-temporary) pool's compatible
> with what the most recent ??.?-RELEASE supports, using
> openzfs-2.1-freebsd for now. The pools involved below have
> never had a zpool upgrade from where they started. (I've no
> pools that have ever had a zpool upgrade.)
>
> (Temporary pools are rare for me, such as this investigation.
> But I'm not testing block_cloning or anything new this time.)
>
> I'll note that I use zfs for bectl, not for redundancy. So
> my evidence is more limited in that respect.
>
> The activities were done on a HoneyComb (16 Cortex-A72 cores).
> The system has and supports ECC RAM, 64 GiBytes of RAM are
> present.
>
> I started by duplicating my normal zfs environment to an
> external USB3 NVMe drive and adjusting the host name and such
> to produce the below. (Non-debug, although I do not strip
> symbols.) :
>
> # uname -apKU
> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082
>
> I then did: git fetch, stash push ., merge --ff-only, stash apply . :
> my normal procedure. I then also applied the patch from:
>
> https://github.com/openzfs/zfs/pull/14739/files
>
> Then I did: buildworld buildkernel, install them, and rebooted.
>
> The result was:
>
> # uname -apKU
> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023     root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400086 1400086
>
> The later poudriere-devel based build of packages from ports is
> based on:
>
> # ~/fbsd-based-on-what-commit.sh -C /usr/ports
> 4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) devel/freebsd-gcc12: Bump to 12.2.0.
> Author:     John Baldwin <jhb@FreeBSD.org>
> Commit:     John Baldwin <jhb@FreeBSD.org>
> CommitDate: 2023-03-25 00:06:40 +0000
> branch: main
> merge-base: 4e94ac9eb97fab16510b74ebcaa9316613182a72
> merge-base: CommitDate: 2023-03-25 00:06:40 +0000
> n613214 (--first-parent --count for merge-base)
>
> poudriere attempted to build 476 packages, starting
> with pkg (in order to build the 56 that I explicitly
> indicate that I want). It is my normal set of ports.
> The form of building is biased to allowing a high
> load average compared to the number of hardware
> threads (same as cores here): each builder is allowed
> to use the full count of hardware threads. The build
> used USE_TMPFS="data" instead of the USE_TMPFS=all I
> normally use on the build machine involved.
>
> And it produced some random errors during the attempted
> builds. A type of example that is easy to interpret
> without further exploration is:
>
> pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse error at "'\x00\x00\x00\x00\x00\x00\x00\x00'": Expected W:(0-9A-Za-z)
>
> A fair number of errors are of the form: the build
> installing a previously built package for use in the
> builder but later the builder can not find some file
> from the package's installation.
>
> Another error reported was:
>
> ld: error: /usr/local/lib/libblkid.a: unknown file type
>
> For reference:
>
> [main-CA72-bulk_a-default] [2023-04-12_20h45m32s] [committing:] Queued: 476 Built: 252 Failed: 11  Skipped: 213 Ignored: 0   Fetched: 0   Tobuild: 0    Time: 00:37:52
>
> I started another build that tried to build 224 packeges:
> the 11 failed and 213 skipped.
>
> Just 1 package built that failed before:
>
> [00:04:58] [09] [00:04:15] Finished databases/sqlite3@default | sqlite3-3.41.0_1,1: Success
>
> It seems to be the only one where the original failure was not
> an example of complaining about the missing/corrupted content
> of a package install used for building. So it is an example
> of randomly varying behavior.
>
> That, in turn, allowed:
>
> [00:04:58] [01] [00:00:00] Building security/nss | nss-3.89
>
> to build but everything else failed or was skipped.
>
> The sqlite3 vs. other failure difference suggests that writes
> have random problems but later reads reliably see the problem
> that resulted (before the content is deleted).
>
>
> After the above:
>
> # zpool status
>    pool: zroot
>   state: ONLINE
> config:
>
>          NAME        STATE     READ WRITE CKSUM
>          zroot       ONLINE       0     0     0
>            da0p8     ONLINE       0     0     0
>
> errors: No known data errors
>
> # zpool scrub zroot
> # zpool status
>    pool: zroot
>   state: ONLINE
>    scan: scrub repaired 0B in 00:16:25 with 0 errors on Wed Apr 12 22:15:39 2023
> config:
>
>          NAME        STATE     READ WRITE CKSUM
>          zroot       ONLINE       0     0     0
>            da0p8     ONLINE       0     0     0
>
> errors: No known data errors
>
>
> ===
> Mark Millard
> marklmi at yahoo.com
>
>
Hi,

I'm having a funny issue here and I'm wondering if it is related.

When building one of my ports I will, eventually, not always, get a file 
full of zeros as a result.

The build will create copies of crispy-setup and, every once in a while, 
one of them will be a blob of zeros:

I'm running the recent ZFS update but I never upgraded my pool:

FreeBSD capeta 14.0-CURRENT FreeBSD 14.0-CURRENT #4 
main-n262091-eed92455e600: Tue Apr 11 16:06:42 IST 2023 
danilo@capeta:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64

cp crispy-setup crispy-doom-setup
--- crispy-heretic-setup ---
cp crispy-setup crispy-heretic-setup
--- crispy-hexen-setup ---
cp crispy-setup crispy-hexen-setup
--- crispy-strife-setup ---
cp crispy-setup crispy-strife-setup

$ ls -l work/stage/usr/local/bin/crispy-*-setup
-r-xr-xr-x  1 danilo  wheel  923488 Apr 13 10:10 
work/stage/usr/local/bin/crispy-doom-setup
-r-xr-xr-x  1 danilo  wheel  923488 Apr 13 10:10 
work/stage/usr/local/bin/crispy-heretic-setup
-r-xr-xr-x  1 danilo  wheel  923488 Apr 13 10:10 
work/stage/usr/local/bin/crispy-hexen-setup
-r-xr-xr-x  1 danilo  wheel  923488 Apr 13 10:10 
work/stage/usr/local/bin/crispy-strife-setup


$ file work/stage/usr/local/bin/crispy-*-setup
work/stage/usr/local/bin/crispy-doom-setup:    ELF 64-bit LSB executable...
work/stage/usr/local/bin/crispy-heretic-setup: ELF 64-bit LSB executable...
work/stage/usr/local/bin/crispy-hexen-setup:   data
work/stage/usr/local/bin/crispy-strife-setup:  ELF 64-bit LSB executable...


$ hexdump work/stage/usr/local/bin/crispy-hexen-setup
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
00e1760




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fa02bf62-f378-3c9c-1ed9-b22d6c1e219a>