Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 09 Apr 2023 15:35:19 -0700
From:      Cy Schubert <Cy.Schubert@cschubert.com>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        FreeBSD User <freebsd@walstatt-de.de>, Charlie Li <vishwin@freebsd.org>, Cy Schubert <Cy.Schubert@cschubert.com>, Rick Macklem <rick.macklem@gmail.com>, Martin Matuska <mm@freebsd.org>, src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org
Subject:   Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
Message-ID:  <20230409223519.68D741F3@slippy.cwsent.com>
In-Reply-To: <CAGudoHFd3_Sc-6ZcrNhv_56BVuW5fwN4uUDNvJw44VpvxoOQvA@mail.gmail.com>
References:  <202304031513.333FD6qw014903@gitrepo.freebsd.org>  <20230403231444.CF48911F@slippy.cwsent.com> <20230403232549.73E331A2@slippy.cwsent.com> <CAM5tNy45XwDNGK27i_Z_96H-sLDXXHuaZbSQ=E7507eCiCvgJw@mail.gmail.com> <20230403235851.84C0467@slippy.cwsent.com> <CAM5tNy6TMoXAKyfWq_psEjK0zy9j%2B=7yzp1vRirAfTdXBxabSQ@mail.gmail.com> <CAM5tNy64HTeC8%2BOT_SHg1osnKKAH3_qQJkyWFuOy-LDAFVzu%2BA@mail.gmail.com> <20230404052811.DA2172C1@slippy.cwsent.com> <7c75b934-cb0a-b32e-bc19-b1e15e8cf3aa@freebsd.org> <CAGudoHHd47N71xQ5yM60XDcmq8S4oOFWsWxKgxEORo4TOh5sPw@mail.gmail.com> <c0bf2e5b-5e9d-4198-782c-eeadb90f3cfb@freebsd.org> <20230409202650.49130b92@thor.intern.walstatt.dynvpn.de> <CAGudoHHUJRy6mSAc-0tt4boECd7uriJ=%2BbX8BUUV=vXVhU=%2BAw@mail.gmail.com> <CAGudoHFd3_Sc-6ZcrNhv_56BVuW5fwN4uUDNvJw44VpvxoOQvA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
In message <CAGudoHFd3_Sc-6ZcrNhv_56BVuW5fwN4uUDNvJw44VpvxoOQvA@mail.gmail.c
om>
, Mateusz Guzik writes:
> On 4/9/23, Mateusz Guzik <mjguzik@gmail.com> wrote:
> > On 4/9/23, FreeBSD User <freebsd@walstatt-de.de> wrote:
> >> Am Sun, 9 Apr 2023 13:23:05 -0400
> >> Charlie Li <vishwin@freebsd.org> schrieb:
> >>
> >>> Mateusz Guzik wrote:
> >>> > On 4/9/23, Charlie Li wrote:
> >>> >> I've also started noticing random artefacts and malformed files
> >>> >> whilst
> >>> >> building packages with poudriere, causing all sorts of "exec format
> >>> >> error"s, missing .so files due to corruption, data file corruption
> >>> >> causing unintended failure modes, etc. All without block_cloning;
> >>> >> enabling such causes a panic of its own when starting multiple
> >>> >> builder
> >>> >> jails at once.
> >>> >>
> >>> >
> >>> > what's the panic?
> >>> >
> >>> manually typed out:
> >>>
> >>> panic: VERIFY(!zil_replaying(zilog, tx)) failed
> >>>
> >>> cpuid = 7
> >>> time = 1681060472
> >>> KDB: stack backtrace:
> >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> >>> 0xfffffe02a05b28a0
> >>> vpanic() at vpanic+0x152/frame 0xfffffe02a05b28f0
> >>> spl_panic() at spl_panic+0x3a/frame 0xfffffe02a05b2950
> >>> zfs_log_clone_range() at zfs_log_clone_range+0x1db/frame
> >>> 0xfffffe02a05b29e0
> >>> zfs_clone_range() at zfs_clone_range+0xae2/frame 0xfffffe02a05b2bc0
> >>> zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0xff/frame
> >>> 0xfffffe02a05b2c40
> >>> vn_copy_file_range() at vn_copy_file_range+0x115/frame
> >>> 0xfffffe02a05b2ce0
> >>> kern_copy_file_range() at kern_copy_file_range+0x34e/frame
> >>> 0xfffffe02a05b2db0
> >>> sys_copy_file_range() at sys_copy_file_range+0x78/frame
> >>> 0xfffffe02a05b2e00
> >>> amd64_syscall() at amd64_syscall+0x148/frame 0xfffffe02a05b2f30
> >>> fast_syscall_common() at fast_syscall_common+0xf8/frame
> >>> 0xfffffe02a05b2f30
> >>> --- syscall (569, FreeBSD ELF64, copy_file_range), rip = 0x908d2a, rsp =
> >>> 0x820c28e68, rbp = 0x820c292b0 ---
> >>> KDB: enter: panic
> >>> [ thread pid 1856 tid 102129 ]
> >>> Stopped at      kdb_enter+0x32: movq    $0,0x12760f3(%rip)
> >>> db>
> >>>
> >>
> >> I have the same issue (crash on access of several, but random datasets).
> >>
> >> It started with /usr/ports build failures when performing updates or
> >> rebuilding ports,
> >> poudriere host doesn't work anymore, as soon as started building ports,
> >> the
> >> hosts (several of
> >> them, same OS revision, new ZFS option enabled) crash.
> >> Also when building binaries for an pkg OS distribution.
> >>
> >> That host also reports a ZFS RAIDZ pool as corrupted, out of the blue!
> >> Some
> >> files from a
> >> poudriere build and /usr/ports build seem to have issues with some
> >> temporarily created files
> >> in work directory.
> >>
> >> On another host /usr/ports is residing on ZFS and it crashes also when
> >> building/updating ports
> >> (/usr/ports residing on ZFS) - but on the same host /home is also
> >> residing
> >> on ZFS, but even
> >> downloading large amounts of emails, the host seem to be stable. Have not
> >> found out yet what
> >> kind of file access triggers the crash.
> >>
> >
> > I reproduced the VERIFY(!zil_replaying(zilog, tx)) panic. As the
> > backtrace shows it triggers when using copy_file_range, I temporarily
> > patched the kernel to never do block cloning. So far the only package
> > which failed to build was sqlite and it was for a legitimate reason
> > (compiler errored out due to a problem in the code).
> >
>
> ... and got an illegitimate failure:
> strip: file format not recognized
>
> the port builds after retrying
>
> iow there is more breakage.
>
> i don't know if the merge can be easily reverted now, will have to see
> about that

git revert is the easy part. What about people who've done zpool upgrade, 
and following the revert have read-only zpools?

Personally, I typically avoid enabling new zpool features for the first few 
weeks, even months, just in case. But not everyone does this.

People who've done zpool upgrade already will need to back up their zpools 
and restore them following any upgrade to a FreeBSD with reverted zfs 
commits.

And, considering the above, we may be long past the point of no return. 

For me, personally, it won't matter either way. For others? I don't know.

Maybe simply disabling block_cloning regardless of the zpool setting might 
be a less disruptive solution.

What is the ZFS project issue number?



-- 
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
NTP:           <cy@nwtime.org>    Web:  https://nwtime.org

			e^(i*pi)+1=0





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20230409223519.68D741F3>