Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 9 Apr 2023 23:58:04 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        FreeBSD User <freebsd@walstatt-de.de>
Cc:        Charlie Li <vishwin@freebsd.org>, Cy Schubert <Cy.Schubert@cschubert.com>,  Rick Macklem <rick.macklem@gmail.com>, Martin Matuska <mm@freebsd.org>, src-committers@freebsd.org,  dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org
Subject:   Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
Message-ID:  <CAGudoHHUJRy6mSAc-0tt4boECd7uriJ=%2BbX8BUUV=vXVhU=%2BAw@mail.gmail.com>
In-Reply-To: <20230409202650.49130b92@thor.intern.walstatt.dynvpn.de>
References:  <202304031513.333FD6qw014903@gitrepo.freebsd.org> <20230403231444.CF48911F@slippy.cwsent.com> <20230403232549.73E331A2@slippy.cwsent.com> <CAM5tNy45XwDNGK27i_Z_96H-sLDXXHuaZbSQ=E7507eCiCvgJw@mail.gmail.com> <20230403235851.84C0467@slippy.cwsent.com> <CAM5tNy6TMoXAKyfWq_psEjK0zy9j%2B=7yzp1vRirAfTdXBxabSQ@mail.gmail.com> <CAM5tNy64HTeC8%2BOT_SHg1osnKKAH3_qQJkyWFuOy-LDAFVzu%2BA@mail.gmail.com> <20230404052811.DA2172C1@slippy.cwsent.com> <7c75b934-cb0a-b32e-bc19-b1e15e8cf3aa@freebsd.org> <CAGudoHHd47N71xQ5yM60XDcmq8S4oOFWsWxKgxEORo4TOh5sPw@mail.gmail.com> <c0bf2e5b-5e9d-4198-782c-eeadb90f3cfb@freebsd.org> <20230409202650.49130b92@thor.intern.walstatt.dynvpn.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On 4/9/23, FreeBSD User <freebsd@walstatt-de.de> wrote:
> Am Sun, 9 Apr 2023 13:23:05 -0400
> Charlie Li <vishwin@freebsd.org> schrieb:
>
>> Mateusz Guzik wrote:
>> > On 4/9/23, Charlie Li wrote:
>> >> I've also started noticing random artefacts and malformed files whilst
>> >> building packages with poudriere, causing all sorts of "exec format
>> >> error"s, missing .so files due to corruption, data file corruption
>> >> causing unintended failure modes, etc. All without block_cloning;
>> >> enabling such causes a panic of its own when starting multiple builder
>> >> jails at once.
>> >>
>> >
>> > what's the panic?
>> >
>> manually typed out:
>>
>> panic: VERIFY(!zil_replaying(zilog, tx)) failed
>>
>> cpuid = 7
>> time = 1681060472
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfffffe02a05b28a0
>> vpanic() at vpanic+0x152/frame 0xfffffe02a05b28f0
>> spl_panic() at spl_panic+0x3a/frame 0xfffffe02a05b2950
>> zfs_log_clone_range() at zfs_log_clone_range+0x1db/frame
>> 0xfffffe02a05b29e0
>> zfs_clone_range() at zfs_clone_range+0xae2/frame 0xfffffe02a05b2bc0
>> zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0xff/frame
>> 0xfffffe02a05b2c40
>> vn_copy_file_range() at vn_copy_file_range+0x115/frame 0xfffffe02a05b2ce0
>> kern_copy_file_range() at kern_copy_file_range+0x34e/frame
>> 0xfffffe02a05b2db0
>> sys_copy_file_range() at sys_copy_file_range+0x78/frame
>> 0xfffffe02a05b2e00
>> amd64_syscall() at amd64_syscall+0x148/frame 0xfffffe02a05b2f30
>> fast_syscall_common() at fast_syscall_common+0xf8/frame
>> 0xfffffe02a05b2f30
>> --- syscall (569, FreeBSD ELF64, copy_file_range), rip = 0x908d2a, rsp =
>> 0x820c28e68, rbp = 0x820c292b0 ---
>> KDB: enter: panic
>> [ thread pid 1856 tid 102129 ]
>> Stopped at      kdb_enter+0x32: movq    $0,0x12760f3(%rip)
>> db>
>>
>
> I have the same issue (crash on access of several, but random datasets).
>
> It started with /usr/ports build failures when performing updates or
> rebuilding ports,
> poudriere host doesn't work anymore, as soon as started building ports, the
> hosts (several of
> them, same OS revision, new ZFS option enabled) crash.
> Also when building binaries for an pkg OS distribution.
>
> That host also reports a ZFS RAIDZ pool as corrupted, out of the blue! Some
> files from a
> poudriere build and /usr/ports build seem to have issues with some
> temporarily created files
> in work directory.
>
> On another host /usr/ports is residing on ZFS and it crashes also when
> building/updating ports
> (/usr/ports residing on ZFS) - but on the same host /home is also residing
> on ZFS, but even
> downloading large amounts of emails, the host seem to be stable. Have not
> found out yet what
> kind of file access triggers the crash.
>

I reproduced the VERIFY(!zil_replaying(zilog, tx)) panic. As the
backtrace shows it triggers when using copy_file_range, I temporarily
patched the kernel to never do block cloning. So far the only package
which failed to build was sqlite and it was for a legitimate reason
(compiler errored out due to a problem in the code).

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHHUJRy6mSAc-0tt4boECd7uriJ=%2BbX8BUUV=vXVhU=%2BAw>