Date: Mon, 7 Mar 2022 11:38:59 +0100 (CET) From: Ronald Klop <ronald-lists@klop.ws> To: Mark Millard <marklmi@yahoo.com> Cc: bob prohaska <fbsd@www.zefox.net>, freebsd-current <freebsd-current@freebsd.org>, freebsd-arm@freebsd.org Subject: Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28)) Message-ID: <1800459695.1.1646649539521@mailrelay> In-Reply-To: <C2F96211-0180-45DA-872F-52358D9ED35B@yahoo.com> References: <C2F96211-0180-45DA-872F-52358D9ED35B.ref@yahoo.com> <C2F96211-0180-45DA-872F-52358D9ED35B@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_0_597764175.1646649539431 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Yes, I spoke to soon too. Often it crashes as soon as I start a parallel po= udriere build. But this time it went very far. As soon as nightly backups k= icked in it was game over again. I had read the mail of Bob on the arm@ ML. But I wanted to let the conclusi= on that it is about the same problem to the developers. (Have seen enough o= f wrong guessing of causes in my work. =F0=9F=98=89) I will need to go further into the binary search of working kernels. This was: FreeBSD 14.0-CURRENT #0 912df91: Wed Mar 2 00:36:35 UTC 2022 Fatal data abort: = =20 x0: ffff000000f1efd8 x0: ffff000000f1efd8 (mac_policy_rm + 0) (mac_polic= y_rm + 0) =20 = =20 x1: 2 x1: 2 = =20 = =20 x2: ffff00000087dcf2 x2: ffff00000087dcf2 (cam_status_table + 2f28a) = =20 (cam_status_table + 2f28a) x3: ffff00000087dcf2 = =20 x3: ffff00000087dcf2 (cam_status_table + 2f28a) (cam_status_table + 2f28a= ) =20 = =20 x4: 102 x4: 102 = =20 = =20 x5: 7 x5: 1 = =20 = =20 x6: 0 x6: ff = =20 = =20 x7: 0 x7: ffffa00011fc2800 = =20 x8: 1 = =20 = =20 x8: 1 x9: ffff000000f37c10 = =20 x9: ffff0000419d9090 (pcpu0 + 90) (g_ctx + 40278fe4) = =20 = =20 x10: ffffa0017be2a600 x10: ffffa000010fa600 = =20 x11: 394aed08d0003a48 = =20 = =20 x12: 350001a8b946a108 x11: 0 = =20 = =20 x12: ffff000000f37c10 x13: badecce4 (pcpu0 + 90) = =20 = =20 x13: ffffa0001fbde6b0 x14: 0 = =20 = =20 x14: 4965ae49 x15: 1 = =20 = =20 x15: 1000193 x16: ffff0000016a4238 = =20 x16: ffff000100482d38 (__stop_set_modmetadata_set + d00) (__stop_set_modme= tadata_set + 448) =20 = =20 x17: ffff00000044a998 x17: ffff00000058ff30 (free + 0) (if_inc_counter + 0= ) =20 = =20 x18: ffff0000b49a23c0 x18: ffff000103f11b80 (g_ctx + b3242314) = =20 (next_index + 3a228c0) x19: 102 = =20 = =20 x19: 102 x20: ffff0000b49a2428 = =20 x20: ffff000103f11be8 (g_ctx + b324237c) (next_index + 3a22928) = =20 x21: ffff00000087dcf2 x21: ffff00000087dcf2 (cam_status_table + 2f28a) (ca= m_status_table + 2f28a) x22: ffff000000f1efd8 x22: ffff000000f1efd8 (mac_policy_rm + 0) (mac_polic= y_rm + 0) x23: ffff00000086f107 x23: 0 (cam_status_table + 2069f) x24: ffffa0001fbde6c8 x24: ffffa0008cba0d00 x25: 0 x25: ffff00000088aa11 x26: 4 (do_execve.fexecv_proc_title += 76b7) x27: 0 x26: ffffa0017be2a600 x28: ffff00010209fcf0 x27: ffffa00025626a80 (next_index + 1bb0a30) x28: ffff000103f11ce0 x29: ffff0000b49a23e0 (next_index + 3a22a20) (g_ctx = + b3242334) x29: ffff000103f11ba0 sp: ffff0000b49a23c0 (next_index + 3a228e0) lr: ffff00000046ef98 sp: ffff000103f11b80 (_rm_runlock_debug + 60) lr: ffff00000046ef98 elr: ffff00000046dc0c (_rm_runlock_debug + 60) (_rm_assert + a4) elr: ffff00000046dc0cspsr: 45 (_rm_assert + a4) far: 10 esr: 96000004 spsr: 45 panic: data abort in critical section or under mutex cpuid =3D 1 time =3D 1646609483 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x174 panic() at panic+0x44 data_abort() at data_abort+0x2d4 handle_el1h_sync() at handle_el1h_sync+0x10 --- exception, esr 0x96000004 _rm_assert() at _rm_assert+0xa4 _rm_runlock_debug() at _rm_runlock_debug+0x5c mac_inpcb_check_deliver() at mac_inpcb_check_deliver+0x74 tcp_input_with_port() at tcp_input_with_port+0xab4 tcp_input() at tcp_input+0xc ip_input() at ip_input+0x2e8 netisr_dispatch_src() at netisr_dispatch_src+0xe4 ether_demux() at ether_demux+0x178 ether_nh_input() at ether_nh_input+0x3e8 netisr_dispatch_src() at netisr_dispatch_src+0xe4 ether_input() at ether_input+0x80 if_input() at if_input+0xc gen_intr() at gen_intr+0x444 ithread_loop() at ithread_loop+0x2a0 fork_exit() at fork_exit+0x74 fork_trampoline() at fork_trampoline+0x14 KDB: enter: panic [ thread pid 12 tid 100063 ] Stopped at kdb_enter+0x44: undefined f902011f db> NB: db> reboot/reset/halt does not work on my RPI4. Luckily I have a wifi c= onnected power switch on it. Regards, Ronald. =20 Van: Mark Millard <marklmi@yahoo.com> Datum: maandag, 7 maart 2022 02:01 Aan: Ronald Klop <ronald-lists@klop.ws> CC: freebsd-current <freebsd-current@freebsd.org>, bob prohaska <fbsd@www.z= efox.net> Onderwerp: Re: panic: data abort in critical section or under mutex (was: R= e: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64= Feb 28)) >=20 > From: Ronald Klop <ronald-lists_at_klop.ws> wrote on > Date: Sun, 6 Mar 2022 23:22:42 +0100 (CET) : >=20 > > Did some binary search with kernels from artifact.ci.freebsd.org. > > > > I suspect "rmlock: Micro-optimize read locking" as cause. > > > > https://cgit.freebsd.org/src/commit/?id=3Dc84bb8cd771ce4bed58152e47a32d= da470bef23a > > > > > > And "rmlock: Add required compiler barriers to _rm_runlock()" as soluti= on. > > > > https://cgit.freebsd.org/src/commit/?id=3D89ae8eb74e87ac19aa2d7abe4ba16= bcccd32bb9f > > > > > > So I probably just had a bad day. >=20 > Well, there is a report of a buildkernel crash after that pair: >=20 > https://lists.freebsd.org/archives/freebsd-arm/2022-March/001078.html >=20 > that references additional information at: >=20 > http://www.zefox.net/~fbsd/rpi3/crashes/20220304/readme >=20 > and reported: >=20 > QUOTE > The console connection dropped before the crash (unrelated) I didn't > get the preamble, all I have is the backtrace and buildkernel log. > Here's the backtrace: > db> bt > Tracing pid 14795 tid 100098 td 0xffffa00017815600 > db_trace_self() at db_trace_self > db_stack_trace() at db_stack_trace+0x11c > db_command() at db_command+0x368 > db_command_loop() at db_command_loop+0x54 > db_trap() at db_trap+0xf8 > kdb_trap() at kdb_trap+0x1cc > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0xf2000000 > kdb_enter() at kdb_enter+0x44 > vpanic() at vpanic+0x1b0 > panic() at panic+0x44 > data_abort() at data_abort+0x2e8 > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0x96000004 > _rm_rlock_debug() at _rm_rlock_debug+0x8c > sysctl_root_handler_locked() at sysctl_root_handler_locked+0x140 > sysctl_root() at sysctl_root+0x1ac > userland_sysctl() at userland_sysctl+0x140 > sys___sysctl() at sys___sysctl+0x68 > do_el0_sync() at do_el0_sync+0x520 > handle_el0_sync() at handle_el0_sync+0x40 > --- exception, esr 0x56000000 > END QUOTE >=20 > The above material does reference _rm_rlock_debug . Might be > related? >=20 > The readme reports: >=20 > main-n253603-0b25cbc79d3: Thu Mar 3 22:48:31 PST 2022 >=20 > for the system doing the buildkernel. This is after > 89ae8eb74e8 . >=20 > (It also mentions another panic earlier in the week, > apparently not reported to the lists at the time.) >=20 > =3D=3D=3D > Mark Millard > marklmi at yahoo.com > =20 >=20 >=20 >=20 ------=_Part_0_597764175.1646649539431 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><head></head><body>Yes, I spoke to soon too. Often it crashes as soon= as I start a parallel poudriere build. But this time it went very far. As = soon as nightly backups kicked in it was game over again.<br /> I had read the mail of Bob on the arm@ ML. But I wanted to let the conclusi= on that it is about the same problem to the developers. (Have seen enough o= f wrong guessing of causes in my work. =F0=9F=98=89)<br /> <br /> I will need to go further into the binary search of working kernels.<br /> <br /> This was: FreeBSD 14.0-CURRENT #0 912df91: Wed Mar 2 00:36:35 UTC 202= 2 <pre> Fatal data abort: &nbs= p; &= nbsp; &nbs= p; &= nbsp; &nbs= p; &= nbsp; &nbs= p; &= nbsp; &nbs= p; x0: ffff000000f1efd8 x0: ffff000000f1efd8 (mac_policy_rm + 0) = (mac_policy_rm + 0) &n= bsp;  = ; &n= bsp; &nbs= p; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x1:  = ; 2 x1: &n= bsp; 2 &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x2: ffff00000087dcf2 x2: ffff00000087dcf2 (cam_status_table + = 2f28a) &nb= sp; = &nb= sp; = (cam_status_table + 2f28a) x3: ffff00000087dcf2 &nbs= p; &= nbsp; &nbs= p; &= nbsp; &nbs= p; &= nbsp; &nbs= p; x3: ffff00000087dcf2 (cam_status_table + 2f28a) (cam_status_table + = 2f28a) &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x4:  = ; 102 x4: = 102  = ; &n= bsp;  = ; &n= bsp;  = ; &n= bsp;  = ; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x5:  = ; 7 x5: &n= bsp; 1 &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x6:  = ; 0 x6: &n= bsp; ff &n= bsp;  = ; &n= bsp;  = ; &n= bsp;  = ; &n= bsp; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x7:  = ; 0 x7: ffffa00011fc2800 &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; x8:  = ; 1 = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x8:  = ; 1 x9: ffff000000f37c10 &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; x9: ffff0000419d9090 (pcpu0 + 90) (g_ctx + 40278fe4) &nbs= p; &= nbsp; &nbs= p; &= nbsp; &nbs= p; &= nbsp; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x10: ffffa0017be2a600 x10: ffffa000010fa600 &n= bsp;  = ; &n= bsp;  = ; &n= bsp;  = ; &n= bsp; x11: 394aed08d0003a48 = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x12: 350001a8b946a108 x11: &= nbsp; 0 &n= bsp;  = ; &n= bsp;  = ; &n= bsp;  = ; &n= bsp; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x12: ffff000000f37c10 x13: &= nbsp; badecce4 (pcpu0 + 90) = &nb= sp; = &nb= sp; = &nb= sp; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x13: ffffa0001fbde6b0 x14: &= nbsp; 0 &n= bsp;  = ; &n= bsp;  = ; &n= bsp;  = ; &n= bsp; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x14: 4965ae49 x15:&nb= sp; = 1 &= nbsp; &nbs= p; &= nbsp; &nbs= p; &= nbsp; &nbs= p; &= nbsp; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x15: 1000193 x1= 6: ffff0000016a4238 &n= bsp;  = ; &n= bsp;  = ; &n= bsp;  = ; &n= bsp; x16: ffff000100482d38 (__stop_set_modmetadata_set + d00) (__stop_set_= modmetadata_set + 448)  = ; &n= bsp;  = ; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x17: ffff00000044a998 x17: ffff00000058ff30 (free + 0) (if_inc_counte= r + 0) &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x18: ffff0000b49a23c0 x18: ffff000103f11b80 (g_ctx + b3242314) &= nbsp; &nbs= p; &= nbsp; &nbs= p; &= nbsp; &nbs= p; (next_index + 3a228c0) x19: = 102  = ; &n= bsp;  = ; &n= bsp;  = ; &n= bsp;  = ; &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = &nb= sp; = x19:  = ; 102 x20: ffff0000b49a2428 = &nb= sp; = &nb= sp; = &nb= sp; = x20: ffff000103f11be8 (g_ctx + b324237c) (next_index + 3a22928) = &nb= sp; = &nb= sp; = &nb= sp; x21: ffff00000087dcf2 x21: ffff00000087dcf2 (cam_status_table + 2f28a= ) (cam_status_table + 2f28a) x22: ffff000000f1efd8 x22: ffff000000f1efd8 (mac_policy_rm + 0) (mac_= policy_rm + 0) x23: ffff00000086f107 x23: &= nbsp; 0 (cam_status_table + 2069f= ) x24: ffffa0001fbde6c8 x24: ffffa0008cba0d00 x25:  = ; 0 x25: ffff00000088aa11 x26: &= nbsp; 4 (do_execve.fexecv_proc_ti= tle + 76b7) x27:  = ; 0 x26: ffffa0017be2a600 x28: ffff00010209fcf0 x27: ffffa00025626a80 (next_index + 1bb0a30) x28: ffff000103f11ce0 x29: ffff0000b49a23e0 (next_index + 3a22a20) (g= _ctx + b3242334) x29: ffff000103f11ba0 sp: ffff0000b49a23c0 (next_index + 3a228e0) lr: ffff00000046ef98 sp: ffff000103f11b80 (_rm_runlock_debug + 60) lr: ffff00000046ef98 elr: ffff00000046dc0c (_rm_runlock_debug + 60) (_rm_assert + a4) elr: ffff00000046dc0cspsr: &= nbsp; 45 (_rm_assert + a4) far:  = ; 10 esr: 96000004 spsr: &nbs= p; 45 panic: data abort in critical section or under mutex cpuid =3D 1 time =3D 1646609483 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x174 panic() at panic+0x44 data_abort() at data_abort+0x2d4 handle_el1h_sync() at handle_el1h_sync+0x10 --- exception, esr 0x96000004 _rm_assert() at _rm_assert+0xa4 _rm_runlock_debug() at _rm_runlock_debug+0x5c mac_inpcb_check_deliver() at mac_inpcb_check_deliver+0x74 tcp_input_with_port() at tcp_input_with_port+0xab4 tcp_input() at tcp_input+0xc ip_input() at ip_input+0x2e8 netisr_dispatch_src() at netisr_dispatch_src+0xe4 ether_demux() at ether_demux+0x178 ether_nh_input() at ether_nh_input+0x3e8 netisr_dispatch_src() at netisr_dispatch_src+0xe4 ether_input() at ether_input+0x80 if_input() at if_input+0xc gen_intr() at gen_intr+0x444 ithread_loop() at ithread_loop+0x2a0 fork_exit() at fork_exit+0x74 fork_trampoline() at fork_trampoline+0x14 KDB: enter: panic [ thread pid 12 tid 100063 ] Stopped at kdb_enter+0x44: undefined &nb= sp; f902011f db> </pre> <br /> NB: db> reboot/reset/halt does not work on my RPI4. Luckily I have a wif= i connected power switch on it.<br /> <br /> Regards,<br /> Ronald.<br /> <br /> <p><strong>Van:</strong> Mark Millard <marklmi@yahoo.com><br /> <strong>Datum:</strong> maandag, 7 maart 2022 02:01<br /> <strong>Aan:</strong> Ronald Klop <ronald-lists@klop.ws><br /> <strong>CC:</strong> freebsd-current <freebsd-current@freebsd.org>, b= ob prohaska <fbsd@www.zefox.net><br /> <strong>Onderwerp:</strong> Re: panic: data abort in critical section or un= der mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 1= 4-CURRENT/aarch64 Feb 28))</p> <blockquote style=3D"padding-right: 0px; padding-left: 5px; margin-left: 5p= x; border-left: #000000 2px solid; margin-right: 0px"> <div class=3D"MessageRFC822Viewer" id=3D"P"> <div class=3D"TextPlainViewer" id=3D"P.P">From: Ronald Klop <ronald-list= s_at_klop.ws> wrote on<br /> Date: Sun, 6 Mar 2022 23:22:42 +0100 (CET) :<br /> <br /> > Did some binary search with kernels from artifact.ci.freebsd.org.<br /= > ><br /> > I suspect "rmlock: Micro-optimize read locking" as cause.<br= /> ><br /> > <a href=3D"https://cgit.freebsd.org/src/commit/?id=3Dc84bb8cd771ce4bed= 58152e47a32dda470bef23a">https://cgit.freebsd.org/src/commit/?id=3Dc84bb8cd= 771ce4bed58152e47a32dda470bef23a</a><br /> ><br /> ><br /> > And "rmlock: Add required compiler barriers to _rm_runlock()"= ; as solution.<br /> ><br /> > <a href=3D"https://cgit.freebsd.org/src/commit/?id=3D89ae8eb74e87ac19a= a2d7abe4ba16bcccd32bb9f">https://cgit.freebsd.org/src/commit/?id=3D89ae8eb7= 4e87ac19aa2d7abe4ba16bcccd32bb9f</a><br /> ><br /> ><br /> > So I probably just had a bad day.<br /> <br /> Well, there is a report of a buildkernel crash after that pair:<br /> <br /> <a href=3D"https://lists.freebsd.org/archives/freebsd-arm/2022-March/001078= .html">https://lists.freebsd.org/archives/freebsd-arm/2022-March/001078.htm= l</a><br /> <br /> that references additional information at:<br /> <br /> <a href=3D"http://www.zefox.net/~fbsd/rpi3/crashes/20220304/readme">http://= www.zefox.net/~fbsd/rpi3/crashes/20220304/readme</a><br /> <br /> and reported:<br /> <br /> QUOTE<br /> The console connection dropped before the crash (unrelated) I didn't<br /> get the preamble, all I have is the backtrace and buildkernel log.<br= /> Here's the backtrace:<br /> db> bt<br /> Tracing pid 14795 tid 100098 td 0xffffa00017815600<br /> db_trace_self() at db_trace_self<br /> db_stack_trace() at db_stack_trace+0x11c<br /> db_command() at db_command+0x368<br /> db_command_loop() at db_command_loop+0x54<br /> db_trap() at db_trap+0xf8<br /> kdb_trap() at kdb_trap+0x1cc<br /> handle_el1h_sync() at handle_el1h_sync+0x10<br /> --- exception, esr 0xf2000000<br /> kdb_enter() at kdb_enter+0x44<br /> vpanic() at vpanic+0x1b0<br /> panic() at panic+0x44<br /> data_abort() at data_abort+0x2e8<br /> handle_el1h_sync() at handle_el1h_sync+0x10<br /> --- exception, esr 0x96000004<br /> _rm_rlock_debug() at _rm_rlock_debug+0x8c<br /> sysctl_root_handler_locked() at sysctl_root_handler_locked+0x140<br /> sysctl_root() at sysctl_root+0x1ac<br /> userland_sysctl() at userland_sysctl+0x140<br /> sys___sysctl() at sys___sysctl+0x68<br /> do_el0_sync() at do_el0_sync+0x520<br /> handle_el0_sync() at handle_el0_sync+0x40<br /> --- exception, esr 0x56000000<br /> END QUOTE<br /> <br /> The above material does reference _rm_rlock_debug . Might be<br /> related?<br /> <br /> The readme reports:<br /> <br /> main-n253603-0b25cbc79d3: Thu Mar 3 22:48:31 PST 2022<br /> <br /> for the system doing the buildkernel. This is after<br /> 89ae8eb74e8 .<br /> <br /> (It also mentions another panic earlier in the week,<br /> apparently not reported to the lists at the time.)<br /> <br /> =3D=3D=3D<br /> Mark Millard<br /> marklmi at yahoo.com<br /> </div> <hr /></div> </blockquote></body></html> ------=_Part_0_597764175.1646649539431--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1800459695.1.1646649539521>