Date: Tue, 14 Nov 2023 13:59:56 +0100 (CET) From: Ronald Klop <ronald-lists@klop.ws> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Alexander Motin <mav@freebsd.org>, current@freebsd.org Subject: Re: crash zfs_clone_range() Message-ID: <1900239445.5968.1699966796547@localhost> In-Reply-To: <ZVEdyHFJyTg0cqCo@kib.kiev.ua> References: <349700057.3452.1699611152405@localhost> <c9c8ab33-efce-5ed0-1f3f-311fa3cf1338@FreeBSD.org> <ZVEdyHFJyTg0cqCo@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_5967_382656657.1699966796420 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Response below Van: Konstantin Belousov <kostikbel@gmail.com> Datum: zondag, 12 november 2023 19:47 Aan: Alexander Motin <mav@freebsd.org> CC: Ronald Klop <ronald-lists@klop.ws>, current@freebsd.org Onderwerp: Re: crash zfs_clone_range() > > On Sun, Nov 12, 2023 at 11:51:40AM -0500, Alexander Motin wrote: > > Hi Ronald, > > > > As I can see, the clone request to ZFS came through nullfs, and it crashed > > immediately on enter. I've never been a VFS layer expert, but to me it may > > be a nullfs problem, not zfs. Is there chance you was (un-)mounting > > something when this happened? > It is not nullfs issue, I believe, but the lack of the busy reference on the > upper mount. I think https://reviews.freebsd.org/D42554 should cover it. > > > > > On 10.11.2023 05:12, Ronald Klop wrote: > > > Hi, > > > > > > Had this crash today on RPI4/15-CURRENT. > > > > > > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 > > > main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG > > > arm64 > > > > > > $ sysctl -a | grep bclon > > > vfs.zfs.bclone_enabled: 1 > > > > > > I started a jail with poudriere to build a package. The jail uses null > > > mounts over ZFS. > > > > > > [root]# cu -s 115200 -l /dev/cuaU0 > > > Connected > > > > > > db> bt > > > Tracing pid 95213 tid 100438 td 0xffff0000e1e97900 > > > db_trace_self() at db_trace_self > > > db_stack_trace() at db_stack_trace+0x120 > > > db_command() at db_command+0x2e4 > > > db_command_loop() at db_command_loop+0x58 > > > db_trap() at db_trap+0x100 > > > kdb_trap() at kdb_trap+0x334 > > > handle_el1h_sync() at handle_el1h_sync+0x18 > > > --- exception, esr 0xf2000000 > > > kdb_enter() at kdb_enter+0x48 > > > vpanic() at vpanic+0x1dc > > > panic() at panic+0x48 > > > data_abort() at data_abort+0x2fc > > > handle_el1h_sync() at handle_el1h_sync+0x18 > > > --- exception, esr 0x96000004 > > > rms_rlock() at rms_rlock+0x1c > > > zfs_clone_range() at zfs_clone_range+0x68 > > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c > > > null_bypass() at null_bypass+0x118 > > > vn_copy_file_range() at vn_copy_file_range+0x18c > > > kern_copy_file_range() at kern_copy_file_range+0x36c > > > sys_copy_file_range() at sys_copy_file_range+0x8c > > > do_el0_sync() at do_el0_sync+0x634 > > > handle_el0_sync() at handle_el0_sync+0x48 > > > --- exception, esr 0x56000000 > > > > > > > > > Oh.. While typing this I rebooted the machine and it happened again. I > > > didn't start anything in particular although the machine runs some > > > jails. > > > > > > x0: 0x00000000000000e0 > > > x1: 0xffffa00090317a48 > > > x2: 0xffffa000f79d4f00 > > > x3: 0xffffa000c61a44a8 > > > x4: 0xffff0000deefe460 ($d.2 + 0xdd776560) > > > x5: 0xffffa001250e4c00 > > > x6: 0xffff0000e54025b5 ($d.5 + 0xc) > > > x7: 0x000000000000030a > > > x8: 0xffff0000e1559000 ($d.2 + 0xdfdd1100) > > > x9: 0x0000000000000001 > > > x10: 0x0000000000000000 > > > x11: 0x0000000000000001 > > > x12: 0x0000000000000002 > > > x13: 0x0000000000000000 > > > x14: 0x0000000000000001 > > > x15: 0x0000000000000000 > > > x16: 0xffff0000016dce88 (__stop_set_modmetadata_set + 0x1310) > > > x17: 0xffff0000004e0d44 (rms_rlock + 0x0) > > > x18: 0xffff0000deefe280 ($d.2 + 0xdd776380) > > > x19: 0x0000000000000000 > > > x20: 0xffff0000deefe460 ($d.2 + 0xdd776560) > > > x21: 0x7fffffffffffffff > > > x22: 0xffffa00090317a48 > > > x23: 0xffffa000f79d4f00 > > > x24: 0xffffa001067ef910 > > > x25: 0x00000000000000e0 > > > x26: 0xffffa000158a8000 > > > x27: 0x0000000000000000 > > > x28: 0xffffa000158a8000 > > > x29: 0xffff0000deefe280 ($d.2 + 0xdd776380) > > > sp: 0xffff0000deefe280 > > > lr: 0xffff000001623564 (zfs_clone_range + 0x6c) > > > elr: 0xffff0000004e0d60 (rms_rlock + 0x1c) > > > spsr: 0x00000000a0000045 > > > far: 0x0000000000000108 > > > esr: 0x0000000096000004 > > > panic: data abort in critical section or under mutex > > > cpuid = 1 > > > time = 1699610885 > > > KDB: stack backtrace: > > > db_trace_self() at db_trace_self > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x38 > > > vpanic() at vpanic+0x1a0 > > > panic() at panic+0x48 > > > data_abort() at data_abort+0x2fc > > > handle_el1h_sync() at handle_el1h_sync+0x18 > > > --- exception, esr 0x96000004 > > > rms_rlock() at rms_rlock+0x1c > > > zfs_clone_range() at zfs_clone_range+0x68 > > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c > > > null_bypass() at null_bypass+0x118 > > > vn_copy_file_range() at vn_copy_file_range+0x18c > > > kern_copy_file_range() at kern_copy_file_range+0x36c > > > sys_copy_file_range() at sys_copy_file_range+0x8c > > > do_el0_sync() at do_el0_sync+0x634 > > > handle_el0_sync() at handle_el0_sync+0x48 > > > --- exception, esr 0x56000000 > > > KDB: enter: panic > > > [ thread pid 3792 tid 100394 ] > > > Stopped at kdb_enter+0x48: str xzr, [x19, #768] > > > db> > > > > > > I'll keep the debugger open for a while. Can I type something for > > > additional info? > > > > > > Regards, > > > Ronald. > > > > -- > > Alexander Motin > > > Hi, Build a new kernel today. FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #20 main-051d69d6f8: Tue Nov 14 12:16:28 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG arm64 x0: 0x00000000000008e0 x1: 0xffffa0006ce1fb38 x2: 0xffffa0006837a400 x3: 0xffffa0012c503a48 x4: 0xffff0000eb0ef430 (next_index + 0x815d790) x5: 0xffffa00152636300 x6: 0xffff0000e2e025b5 ($d.5 + 0xc) x7: 0x000000000000030a x8: 0xffff0000eb3212c0 (next_index + 0x838f620) x9: 0x0000000000000001 x10: 0x0000000000000000 x11: 0x0000000000000001 x12: 0x0000000000000002 x13: 0x0000000000000000 x14: 0x0000000000000001 x15: 0x0000000000000000 x16: 0xffff0000016e5b58 (__stop_set_modmetadata_set + 0x1328) x17: 0xffff0000004e0c28 (rms_rlock + 0x0) x18: 0xffff0000eb0ef250 (next_index + 0x815d5b0) x19: 0x0000000000000800 x20: 0xffff0000eb0ef430 (next_index + 0x815d790) x21: 0x7fffffffffffffff x22: 0xffffa0006ce1fb38 x23: 0xffffa0006837a400 x24: 0xffffa001ee486000 x25: 0x00000000000008e0 x26: 0xffffa000135ca000 x27: 0x0000000000000800 x28: 0xffffa000135ca000 x29: 0xffff0000eb0ef250 (next_index + 0x815d5b0) sp: 0xffff0000eb0ef250 lr: 0xffff00000162bee8 (zfs_clone_range + 0x6c) elr: 0xffff0000004e0c44 (rms_rlock + 0x1c) spsr: 0x00000000a0000045 far: 0x0000000000000908 esr: 0x0000000096000004 panic: data abort in critical section or under mutex cpuid = 2 time = 1699966486 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1a0 panic() at panic+0x48 data_abort() at data_abort+0x2fc handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0x96000004 rms_rlock() at rms_rlock+0x1c zfs_clone_range() at zfs_clone_range+0x68 zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c null_bypass() at null_bypass+0x118 vn_copy_file_range() at vn_copy_file_range+0x1c0 kern_copy_file_range() at kern_copy_file_range+0x36c sys_copy_file_range() at sys_copy_file_range+0x8c do_el0_sync() at do_el0_sync+0x634 handle_el0_sync() at handle_el0_sync+0x48 --- exception, esr 0x56000000 KDB: enter: panic [ thread pid 3620 tid 100911 ] Stopped at kdb_enter+0x48: str xzr, [x19, #768] db> This happens as soon as I start poudriere in a jenkins-agent jail. AFAIK this includes the two recent vn_copy_file_range changes of Konstantin. Next I will install a GENERIC kernel instead of GENERIC-NODEBUG. Regards, Ronald. ------=_Part_5967_382656657.1699966796420 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit <html><head></head><body>Response below<br> <p><strong>Van:</strong> Konstantin Belousov <kostikbel@gmail.com><br> <strong>Datum:</strong> zondag, 12 november 2023 19:47<br> <strong>Aan:</strong> Alexander Motin <mav@freebsd.org><br> <strong>CC:</strong> Ronald Klop <ronald-lists@klop.ws>, current@freebsd.org<br> <strong>Onderwerp:</strong> Re: crash zfs_clone_range()</p> <blockquote style="padding-right: 0px; padding-left: 5px; margin-left: 5px; border-left: #000000 2px solid; margin-right: 0px"> <div class="MessageRFC822Viewer" id="P"> <div class="TextPlainViewer" id="P.P">On Sun, Nov 12, 2023 at 11:51:40AM -0500, Alexander Motin wrote:<br> > Hi Ronald,<br> ><br> > As I can see, the clone request to ZFS came through nullfs, and it crashed<br> > immediately on enter. I've never been a VFS layer expert, but to me it may<br> > be a nullfs problem, not zfs. Is there chance you was (un-)mounting<br> > something when this happened?<br> It is not nullfs issue, I believe, but the lack of the busy reference on the<br> upper mount. I think <a href="https://reviews.freebsd.org/D42554">https://reviews.freebsd.org/D42554</a> should cover it.<br> <br> ><br> > On 10.11.2023 05:12, Ronald Klop wrote:<br> > > Hi,<br> > ><br> > > Had this crash today on RPI4/15-CURRENT.<br> > ><br> > > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19<br> > > main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG<br> > > arm64<br> > ><br> > > $ sysctl -a | grep bclon<br> > > vfs.zfs.bclone_enabled: 1<br> > ><br> > > I started a jail with poudriere to build a package. The jail uses null<br> > > mounts over ZFS.<br> > ><br> > > [root]# cu -s 115200 -l /dev/cuaU0<br> > > Connected<br> > ><br> > > db> bt<br> > > Tracing pid 95213 tid 100438 td 0xffff0000e1e97900<br> > > db_trace_self() at db_trace_self<br> > > db_stack_trace() at db_stack_trace+0x120<br> > > db_command() at db_command+0x2e4<br> > > db_command_loop() at db_command_loop+0x58<br> > > db_trap() at db_trap+0x100<br> > > kdb_trap() at kdb_trap+0x334<br> > > handle_el1h_sync() at handle_el1h_sync+0x18<br> > > --- exception, esr 0xf2000000<br> > > kdb_enter() at kdb_enter+0x48<br> > > vpanic() at vpanic+0x1dc<br> > > panic() at panic+0x48<br> > > data_abort() at data_abort+0x2fc<br> > > handle_el1h_sync() at handle_el1h_sync+0x18<br> > > --- exception, esr 0x96000004<br> > > rms_rlock() at rms_rlock+0x1c<br> > > zfs_clone_range() at zfs_clone_range+0x68<br> > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c<br> > > null_bypass() at null_bypass+0x118<br> > > vn_copy_file_range() at vn_copy_file_range+0x18c<br> > > kern_copy_file_range() at kern_copy_file_range+0x36c<br> > > sys_copy_file_range() at sys_copy_file_range+0x8c<br> > > do_el0_sync() at do_el0_sync+0x634<br> > > handle_el0_sync() at handle_el0_sync+0x48<br> > > --- exception, esr 0x56000000<br> > ><br> > ><br> > > Oh.. While typing this I rebooted the machine and it happened again. I<br> > > didn't start anything in particular although the machine runs some<br> > > jails.<br> > ><br> > > x0: 0x00000000000000e0<br> > > x1: 0xffffa00090317a48<br> > > x2: 0xffffa000f79d4f00<br> > > x3: 0xffffa000c61a44a8<br> > > x4: 0xffff0000deefe460 ($d.2 + 0xdd776560)<br> > > x5: 0xffffa001250e4c00<br> > > x6: 0xffff0000e54025b5 ($d.5 + 0xc)<br> > > x7: 0x000000000000030a<br> > > x8: 0xffff0000e1559000 ($d.2 + 0xdfdd1100)<br> > > x9: 0x0000000000000001<br> > > x10: 0x0000000000000000<br> > > x11: 0x0000000000000001<br> > > x12: 0x0000000000000002<br> > > x13: 0x0000000000000000<br> > > x14: 0x0000000000000001<br> > > x15: 0x0000000000000000<br> > > x16: 0xffff0000016dce88 (__stop_set_modmetadata_set + 0x1310)<br> > > x17: 0xffff0000004e0d44 (rms_rlock + 0x0)<br> > > x18: 0xffff0000deefe280 ($d.2 + 0xdd776380)<br> > > x19: 0x0000000000000000<br> > > x20: 0xffff0000deefe460 ($d.2 + 0xdd776560)<br> > > x21: 0x7fffffffffffffff<br> > > x22: 0xffffa00090317a48<br> > > x23: 0xffffa000f79d4f00<br> > > x24: 0xffffa001067ef910<br> > > x25: 0x00000000000000e0<br> > > x26: 0xffffa000158a8000<br> > > x27: 0x0000000000000000<br> > > x28: 0xffffa000158a8000<br> > > x29: 0xffff0000deefe280 ($d.2 + 0xdd776380)<br> > > sp: 0xffff0000deefe280<br> > > lr: 0xffff000001623564 (zfs_clone_range + 0x6c)<br> > > elr: 0xffff0000004e0d60 (rms_rlock + 0x1c)<br> > > spsr: 0x00000000a0000045<br> > > far: 0x0000000000000108<br> > > esr: 0x0000000096000004<br> > > panic: data abort in critical section or under mutex<br> > > cpuid = 1<br> > > time = 1699610885<br> > > KDB: stack backtrace:<br> > > db_trace_self() at db_trace_self<br> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x38<br> > > vpanic() at vpanic+0x1a0<br> > > panic() at panic+0x48<br> > > data_abort() at data_abort+0x2fc<br> > > handle_el1h_sync() at handle_el1h_sync+0x18<br> > > --- exception, esr 0x96000004<br> > > rms_rlock() at rms_rlock+0x1c<br> > > zfs_clone_range() at zfs_clone_range+0x68<br> > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c<br> > > null_bypass() at null_bypass+0x118<br> > > vn_copy_file_range() at vn_copy_file_range+0x18c<br> > > kern_copy_file_range() at kern_copy_file_range+0x36c<br> > > sys_copy_file_range() at sys_copy_file_range+0x8c<br> > > do_el0_sync() at do_el0_sync+0x634<br> > > handle_el0_sync() at handle_el0_sync+0x48<br> > > --- exception, esr 0x56000000<br> > > KDB: enter: panic<br> > > [ thread pid 3792 tid 100394 ]<br> > > Stopped at kdb_enter+0x48: str xzr, [x19, #768]<br> > > db><br> > ><br> > > I'll keep the debugger open for a while. Can I type something for<br> > > additional info?<br> > ><br> > > Regards,<br> > > Ronald.<br> ><br> > --<br> > Alexander Motin</div> <hr></div> </blockquote> <br> <br> Hi,<br> <br> Build a new kernel today.<br> FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #20 main-051d69d6f8: Tue Nov 14 12:16:28 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG arm64<br> <br> x0: 0x00000000000008e0 <br> x1: 0xffffa0006ce1fb38 <br> x2: 0xffffa0006837a400 <br> x3: 0xffffa0012c503a48 <br> x4: 0xffff0000eb0ef430 (next_index + 0x815d790) <br> x5: 0xffffa00152636300 <br> x6: 0xffff0000e2e025b5 ($d.5 + 0xc) <br> x7: 0x000000000000030a <br> x8: 0xffff0000eb3212c0 (next_index + 0x838f620) <br> x9: 0x0000000000000001 <br> x10: 0x0000000000000000 <br> x11: 0x0000000000000001 <br> x12: 0x0000000000000002 <br> x13: 0x0000000000000000 <br> x14: 0x0000000000000001 <br> x15: 0x0000000000000000<br> x16: 0xffff0000016e5b58 (__stop_set_modmetadata_set + 0x1328)<br> x17: 0xffff0000004e0c28 (rms_rlock + 0x0)<br> x18: 0xffff0000eb0ef250 (next_index + 0x815d5b0)<br> x19: 0x0000000000000800<br> x20: 0xffff0000eb0ef430 (next_index + 0x815d790)<br> x21: 0x7fffffffffffffff<br> x22: 0xffffa0006ce1fb38<br> x23: 0xffffa0006837a400<br> x24: 0xffffa001ee486000<br> x25: 0x00000000000008e0<br> x26: 0xffffa000135ca000<br> x27: 0x0000000000000800<br> x28: 0xffffa000135ca000<br> x29: 0xffff0000eb0ef250 (next_index + 0x815d5b0)<br> sp: 0xffff0000eb0ef250<br> lr: 0xffff00000162bee8 (zfs_clone_range + 0x6c)<br> elr: 0xffff0000004e0c44 (rms_rlock + 0x1c)<br> spsr: 0x00000000a0000045<br> far: 0x0000000000000908<br> esr: 0x0000000096000004<br> panic: data abort in critical section or under mutex<br> cpuid = 2<br> time = 1699966486<br> KDB: stack backtrace:<br> db_trace_self() at db_trace_self<br> db_trace_self_wrapper() at db_trace_self_wrapper+0x38<br> vpanic() at vpanic+0x1a0<br> panic() at panic+0x48<br> data_abort() at data_abort+0x2fc<br> handle_el1h_sync() at handle_el1h_sync+0x18<br> --- exception, esr 0x96000004<br> rms_rlock() at rms_rlock+0x1c<br> zfs_clone_range() at zfs_clone_range+0x68<br> zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c<br> null_bypass() at null_bypass+0x118<br> vn_copy_file_range() at vn_copy_file_range+0x1c0<br> kern_copy_file_range() at kern_copy_file_range+0x36c<br> sys_copy_file_range() at sys_copy_file_range+0x8c<br> do_el0_sync() at do_el0_sync+0x634<br> handle_el0_sync() at handle_el0_sync+0x48<br> --- exception, esr 0x56000000<br> KDB: enter: panic<br> [ thread pid 3620 tid 100911 ]<br> Stopped at kdb_enter+0x48: str xzr, [x19, #768]<br> db><br> <br> This happens as soon as I start poudriere in a jenkins-agent jail.<br> <br> AFAIK this includes the two recent vn_copy_file_range changes of Konstantin.<br> <br> Next I will install a GENERIC kernel instead of GENERIC-NODEBUG.<br> <br> Regards,<br> Ronald.<br> </body></html> ------=_Part_5967_382656657.1699966796420--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1900239445.5968.1699966796547>