Date: Mon, 7 Mar 2022 06:37:46 -0800 From: Mark Millard <marklmi@yahoo.com> To: Ronald Klop <ronald-lists@klop.ws>, Mark Johnston <markj@FreeBSD.org> Cc: bob prohaska <fbsd@www.zefox.net>, Free BSD <freebsd-arm@freebsd.org>, freebsd-current <freebsd-current@freebsd.org> Subject: Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28)) Message-ID: <10724FB9-8E75-4DB7-A0F4-CFF55D21272B@yahoo.com> In-Reply-To: <132978150.92.1646660769467@mailrelay> References: <C2F96211-0180-45DA-872F-52358D9ED35B.ref@yahoo.com> <C2F96211-0180-45DA-872F-52358D9ED35B@yahoo.com> <1800459695.1.1646649539521@mailrelay> <132978150.92.1646660769467@mailrelay>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2022-Mar-7, at 05:46, Ronald Klop <ronald-lists@klop.ws> wrote: > Dear Mark Johnston, >=20 > I did some binary search in the kernels and came to the conclusion = that = https://cgit.freebsd.org/src/commit/?id=3D1517b8d5a7f58897200497811de1b188= 09c07d3e still works and = https://cgit.freebsd.org/src/commit/?id=3D407c34e735b5d17e2be574808a09e6d7= 29b0a45a panics. >=20 > I suspect your commit in = https://cgit.freebsd.org/src/commit/?id=3Dc84bb8cd771ce4bed58152e47a32dda4= 70bef23a. >=20 > Last panic: >=20 > panic: vm_fault failed: ffff00000046e708 error 1 > cpuid =3D 1 > time =3D 1646660058 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > vpanic() at vpanic+0x174 > panic() at panic+0x44 > data_abort() at data_abort+0x2e8 > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0x96000004 > _rm_rlock_debug() at _rm_rlock_debug+0x8c > osd_get() at osd_get+0x5c > zio_execute() at zio_execute+0xf8 > taskqueue_run_locked() at taskqueue_run_locked+0x178 > taskqueue_thread_loop() at taskqueue_thread_loop+0xc8 > fork_exit() at fork_exit+0x74 > fork_trampoline() at fork_trampoline+0x14 > KDB: enter: panic > [ thread pid 0 tid 100129 ] > Stopped at kdb_enter+0x44: undefined f902011f > db> Was this a WITNESS/DEBUG kernel? Non-WITNESS? Non-debug? Which aarch64 variant? Bob's was Cortex-A53 (RPi3). > A more recent kernel (912df91) still panics. See below. >=20 > Do you have time to look into this? What can I provide in information = to help? >=20 > Regards, > Ronald. >=20 > Van: Ronald Klop <ronald-lists@klop.ws> > Datum: maandag, 7 maart 2022 11:38 > Aan: Mark Millard <marklmi@yahoo.com> > CC: bob prohaska <fbsd@www.zefox.net>, freebsd-current = <freebsd-current@freebsd.org>, freebsd-arm@freebsd.org > Onderwerp: Re: panic: data abort in critical section or under mutex = (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on = 14-CURRENT/aarch64 Feb 28)) >=20 > Yes, I spoke to soon too. Often it crashes as soon as I start a = parallel poudriere build. But this time it went very far. As soon as = nightly backups kicked in it was game over again. > I had read the mail of Bob on the arm@ ML. But I wanted to let the = conclusion that it is about the same problem to the developers. (Have = seen enough of wrong guessing of causes in my work. ) >=20 > I will need to go further into the binary search of working kernels. >=20 > This was: FreeBSD 14.0-CURRENT #0 912df91: Wed Mar 2 00:36:35 UTC = 2022 > Fatal data abort: = =20 > x0: ffff000000f1efd8 x0: ffff000000f1efd8 (mac_policy_rm + 0) = (mac_policy_rm + 0) =20 > = =20 > x1: 2 x1: 2 = =20 > = =20 > x2: ffff00000087dcf2 x2: ffff00000087dcf2 (cam_status_table + = 2f28a) =20 > (cam_status_table + 2f28a) x3: ffff00000087dcf2 = =20 > x3: ffff00000087dcf2 (cam_status_table + 2f28a) (cam_status_table + = 2f28a) =20 > = =20 > x4: 102 x4: 102 = =20 > = =20 > x5: 7 x5: 1 = =20 > = =20 > x6: 0 x6: ff = =20 > = =20 > x7: 0 x7: ffffa00011fc2800 = =20 > x8: 1 = =20 > = =20 > x8: 1 x9: ffff000000f37c10 = =20 > x9: ffff0000419d9090 (pcpu0 + 90) (g_ctx + 40278fe4) = =20 > = =20 > x10: ffffa0017be2a600 x10: ffffa000010fa600 = =20 > x11: 394aed08d0003a48 = =20 > = =20 > x12: 350001a8b946a108 x11: 0 = =20 > = =20 > x12: ffff000000f37c10 x13: badecce4 (pcpu0 + 90) = =20 > = =20 > x13: ffffa0001fbde6b0 x14: 0 = =20 > = =20 > x14: 4965ae49 x15: 1 = =20 > = =20 > x15: 1000193 x16: ffff0000016a4238 = =20 > x16: ffff000100482d38 (__stop_set_modmetadata_set + d00) = (__stop_set_modmetadata_set + 448) =20= > = =20 > x17: ffff00000044a998 x17: ffff00000058ff30 (free + 0) = (if_inc_counter + 0) = =20 > = =20 > x18: ffff0000b49a23c0 x18: ffff000103f11b80 (g_ctx + b3242314) = =20 > (next_index + 3a228c0) x19: 102 = =20 >=20 > = =20 > x19: 102 x20: ffff0000b49a2428 = =20 > x20: ffff000103f11be8 (g_ctx + b324237c) (next_index + 3a22928) = =20 >=20 > x21: ffff00000087dcf2 x21: ffff00000087dcf2 (cam_status_table + = 2f28a) (cam_status_table + 2f28a) >=20 > x22: ffff000000f1efd8 x22: ffff000000f1efd8 (mac_policy_rm + 0) = (mac_policy_rm + 0) >=20 > x23: ffff00000086f107 x23: 0 (cam_status_table + = 2069f) >=20 > x24: ffffa0001fbde6c8 x24: ffffa0008cba0d00 > x25: 0 >=20 > x25: ffff00000088aa11 x26: 4 = (do_execve.fexecv_proc_title + 76b7) >=20 > x27: 0 x26: ffffa0017be2a600 > x28: ffff00010209fcf0 > x27: ffffa00025626a80 (next_index + 1bb0a30) >=20 > x28: ffff000103f11ce0 x29: ffff0000b49a23e0 (next_index + 3a22a20) = (g_ctx + b3242334) >=20 > x29: ffff000103f11ba0 sp: ffff0000b49a23c0 > (next_index + 3a228e0) lr: ffff00000046ef98 > sp: ffff000103f11b80 > (_rm_runlock_debug + 60) lr: ffff00000046ef98 > elr: ffff00000046dc0c (_rm_runlock_debug + 60) (_rm_assert + a4) >=20 > elr: ffff00000046dc0cspsr: 45 > (_rm_assert + a4) far: 10 >=20 > esr: 96000004 > spsr: 45 >=20 > panic: data abort in critical section or under mutex > cpuid =3D 1 > time =3D 1646609483 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > vpanic() at vpanic+0x174 > panic() at panic+0x44 > data_abort() at data_abort+0x2d4 > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0x96000004 > _rm_assert() at _rm_assert+0xa4 > _rm_runlock_debug() at _rm_runlock_debug+0x5c > mac_inpcb_check_deliver() at mac_inpcb_check_deliver+0x74 > tcp_input_with_port() at tcp_input_with_port+0xab4 > tcp_input() at tcp_input+0xc > ip_input() at ip_input+0x2e8 > netisr_dispatch_src() at netisr_dispatch_src+0xe4 > ether_demux() at ether_demux+0x178 > ether_nh_input() at ether_nh_input+0x3e8 > netisr_dispatch_src() at netisr_dispatch_src+0xe4 > ether_input() at ether_input+0x80 > if_input() at if_input+0xc > gen_intr() at gen_intr+0x444 > ithread_loop() at ithread_loop+0x2a0 > fork_exit() at fork_exit+0x74 > fork_trampoline() at fork_trampoline+0x14 > KDB: enter: panic > [ thread pid 12 tid 100063 ] > Stopped at kdb_enter+0x44: undefined f902011f > db> >=20 >=20 > NB: db> reboot/reset/halt does not work on my RPI4. Luckily I have a = wifi connected power switch on it. >=20 > Regards, > Ronald. >=20 > Van: Mark Millard <marklmi@yahoo.com> > Datum: maandag, 7 maart 2022 02:01 > Aan: Ronald Klop <ronald-lists@klop.ws> > CC: freebsd-current <freebsd-current@freebsd.org>, bob prohaska = <fbsd@www.zefox.net> > Onderwerp: Re: panic: data abort in critical section or under mutex = (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on = 14-CURRENT/aarch64 Feb 28)) >=20 > From: Ronald Klop <ronald-lists_at_klop.ws> wrote on > Date: Sun, 6 Mar 2022 23:22:42 +0100 (CET) : >=20 > > Did some binary search with kernels from artifact.ci.freebsd.org. > > > > I suspect "rmlock: Micro-optimize read locking" as cause. > > > > = https://cgit.freebsd.org/src/commit/?id=3Dc84bb8cd771ce4bed58152e47a32dda4= 70bef23a > > > > > > And "rmlock: Add required compiler barriers to _rm_runlock()" as = solution. > > > > = https://cgit.freebsd.org/src/commit/?id=3D89ae8eb74e87ac19aa2d7abe4ba16bcc= cd32bb9f > > > > > > So I probably just had a bad day. >=20 > Well, there is a report of a buildkernel crash after that pair: >=20 > https://lists.freebsd.org/archives/freebsd-arm/2022-March/001078.html >=20 > that references additional information at: >=20 > http://www.zefox.net/~fbsd/rpi3/crashes/20220304/readme >=20 > and reported: >=20 > QUOTE > The console connection dropped before the crash (unrelated) I didn't > get the preamble, all I have is the backtrace and buildkernel log. > Here's the backtrace: > db> bt > Tracing pid 14795 tid 100098 td 0xffffa00017815600 > db_trace_self() at db_trace_self > db_stack_trace() at db_stack_trace+0x11c > db_command() at db_command+0x368 > db_command_loop() at db_command_loop+0x54 > db_trap() at db_trap+0xf8 > kdb_trap() at kdb_trap+0x1cc > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0xf2000000 > kdb_enter() at kdb_enter+0x44 > vpanic() at vpanic+0x1b0 > panic() at panic+0x44 > data_abort() at data_abort+0x2e8 > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0x96000004 > _rm_rlock_debug() at _rm_rlock_debug+0x8c > sysctl_root_handler_locked() at sysctl_root_handler_locked+0x140 > sysctl_root() at sysctl_root+0x1ac > userland_sysctl() at userland_sysctl+0x140 > sys___sysctl() at sys___sysctl+0x68 > do_el0_sync() at do_el0_sync+0x520 > handle_el0_sync() at handle_el0_sync+0x40 > --- exception, esr 0x56000000 > END QUOTE This was a WITNESS and debug kernel as I understand. Also, this was a RPi3, so Cortex-A53, that has in-order-execution cores. (Unlike Cortex-A72's, for example). > The above material does reference _rm_rlock_debug . Might be > related? >=20 > The readme reports: >=20 > main-n253603-0b25cbc79d3: Thu Mar 3 22:48:31 PST 2022 >=20 > for the system doing the buildkernel. This is after > 89ae8eb74e8 . >=20 > (It also mentions another panic earlier in the week, > apparently not reported to the lists at the time.) >=20 So far as I have noticed, all reports of the crashes in _rm_rlock_debug are on aarch64 hardware. So may be the problem is tied to the weak memory model --but for something that matters to a Cortex-A53's executes-in-order cores? (Just athought.) But, then, the constrasting(?) status of powerpc64 might be of note. (And I'll stop guessing here.) I do not know if any non-WITNESS/non-debug kernel builds have failed. =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?10724FB9-8E75-4DB7-A0F4-CFF55D21272B>