From nobody Mon Mar 7 14:37:46 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 6D1B11A091DC for ; Mon, 7 Mar 2022 14:37:57 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic310-21.consmr.mail.gq1.yahoo.com (sonic310-21.consmr.mail.gq1.yahoo.com [98.137.69.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4KC1Jz4RMGz3MvP for ; Mon, 7 Mar 2022 14:37:55 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1646663868; bh=7kHVrKu9oXlIEbV3NBPRG6N/pa7eH09AMl3KAfRQxWk=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=oAJxDSwmrS34oKP5Hhbiof6YTwidw8J8R4zVmrmSgemeohFeSnwJe0XldqC1QxWTyd2WhkGx1YAv1+Ak98d+wEkWw/GFJH8/BRh1UbhILyYzft8Rl1oMxCGkMAjmpgUT2vYG0WrCkfOSc4cu7sPmu0cyaTZIc6dKOzC/uDRGmCENozRxK+paEb86ccw2GnrVPH/VDlWzXwTzsaZTiukp2wIjKXYrhGo6msgAOaHM8IEbDi828t7Jt+yWEjGrmWl0AvG5XCtLKXDZHgxNmZ2qshuibWYoXm5wfyuxMRTlfj5qYfgjRDN1HUZzgMhpTFmIQepSJVjqlt+8RMdOfZB+jQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1646663868; bh=LdrCv26l+T62kLuaxuIxgGxCCWrx1VMnanmN8oql19c=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=rbbBIA542RDgZ9+HOdb/DX5iHhJb2s+zsMno+YuNLPH/q3h0lzDAgK3f3kW2WfOYW/jRCto1begxtHe8i9LEDx5fvVee+40GrkoKn8akk+rKxxeUczeIlmDLcNJx14WEehKf00sNdyVEA06auStG6A+WalJ8jEHtI0PW797qKXCSvlmg2EMtjy/r9BIpaP0nzzFlVZSKHi/3btAHWfj/1I88WN/tB3l6FobX6szhzCXDQYCjJ8DRB+6opiPwzsDcCRZc90cXxNezFYW6ShJGNwXe5h4SpMGcglKHPCxy+9Rf7rRHTe9gGCMVw6QehojJ5YumoFADgyFdaHtEVYDsmA== X-YMail-OSG: Bb_cLScVM1nHCq0Tn57u5zAbCRG0P3eEuLc_kgI67v9NteVA657teiE1KPx0yIG HtVLRsnn33vh5GIdfdh5eqbOgk4ULgT5IMmDDvKi_1GAziSAHlSr3MqtdMARtH3l3xD12CClGsy4 3YfT2DR7HvN0UEI5aldyoIYa1vPq6EtznDN0Vp3LlUAVs5P6x0pEoq1oCZCRCSVDL.qsHxS6CHVj wJuIxRIDiFPXUxzDzKUZeSnFBMq9MnxJu__C6Df2hqo6k34lf60eaDXOJE_yMI7X9k6U7r.vZ.dG gArgEofvBFDpyV4l6AiQvbk6r4i0rEnzyq.PN11G_iE93lx5a6Ag5ddjijcApFv_UvhcEV2xaVEF V50mQPb0Cy0w7ZmfnWinjHOvdWjyXi0OHcPDBDh9WMmVAt2xwiTnYqRhvH9Wc4YDINyH_yBdCQFN KNIAC19Z_13sej2LI8XoywVEJeQhrzuWQ0DabNS6Ekqx_Uu.WDTPq9NDXxa7jDHWkjfR199o0MTD k4ciW.NdkC.rMxYsFKsz4Qb5hVCifMCBl2ad2VXu1TpG2g4s4kh9vm5pa0wKXBMPStMZHBbOR9oz WC0iWgpPhcyVz8v1jlMKUkZToJEUbwn9U0OpIqj51KKViXadNZH3ToLfdN4Cw5LMBVxPACRGm0Wr xo_CVtnmqiz7KRIPwVTVujHeVvzQfxerkoQXfX1MOs.uPQcAfdY.4h4UUQJnCbqGfYMIW11HABxA D2BRj0Aqdp6AXCDjIKLWDOZzpioi17sSrJO1SkhhcfwGZZRIs9gZzPAmXGwH7ORWdc8se0JZqjwc GeuSDXZ4dFEeuW7Sc9u_v.P2UABsDKXk2T.uxCeb2XvbUFMx_nggeYuqrHcmomNM1RFL0wljrBGH ZlAgfY9gd5St83j5bMtOqudOIPyp1NWKWFUfJrynPXjIVKN87HLU.oo0XFvv7RnXIZQmKBlnK4yM zI1ex99ar4iU0lrmarLZqZWvXT2Ov59Ykte2ioNR2oEy7LsZvnc8dE0xGeRJQcET4g1lBsWPWXUR Hhv4_jhn4s6VRrlY_bOHh_ffkb5DGuZx4._riHu2yjar39RqCRvN0lgxlxxv7I2KvnEaghi_xbj9 XPe02Eyq9dUYs_C64jMhr36D2HxQMEt1eEoNN_cQDtucSJ1QuXp...hKPu.myCyXRFDMXICgpk.f A13feBmqd9yBqI.AR6fRvmE9RWhGXT0GloJZIX6qpeM96_cjcvClNB.ocxy0cIHC4XbBkRKCH11N FPNIqTYekXogYya9g8XtbgAovqztP6ppoTEhQsNMmFIN2HzFCKw4QH3M.HGl_ZcpnxbF4BnmJ3ZY owKT.y99ya_k33gIUez8E75U.0.nRsm_gF_D._Yr0Fk5InkymV318cbfM7tGDOkWrEmlgLUFFoCZ jOQQFfe6UbAmt_hkR2q2aXvHSy3IcZrm4yUciU.iS8XIPyHPQ_BlpR3JNaAk3r3oORQI3RDYsEyp Oq8tesBRrFNDF4Gj6s9HetZ4gVt98fjAlySHR9PodJB0jAd8v7PiJcde59wEK_dXfPYxetYLyxaY 8okEJuEFPFPvrPMlv0jRPeqPXAgKuRHn8tCCKrHhx8GOXaJG4n_c33mv8tFrRZn6R64R9JLHBdC1 qa25rn2b5U1UDmRT8OkG9bNA7F70j.fzllyILrZ9CWRcIz0rTsnpdRff3iv.VEwBuXLmxkuHEE1y 2SHBE8IotCjFzgaArFKGiJVF_DTkydrR8bFNZwi8BhCHRC65_viSTYGPYcdaHStQqlI5wPnJIZ4z iz2OPothyXqz5nvspYo_A2Kt.Vynlo8jvOi1.D75dEaFzI4I0b4Q7oFEOR8LUmMZi81Tr2fE2AJJ UiB3DR6rpCF.uOt_KfSCgycSlbZgowxzwRiQLTYj8vlvZaOm8Wn.6Cmqenw4ioZr3cu3CyyO9n7r WnJXW1jTNOFGY.NQkAkuNLooS5vqBM1svoVxO63PKvaUT.EQSwB7CbMCDBKFad31KTLU0oEX9nes Xi_mkzz4nBjyKKW_R1oMPxRyjqp8F.NS8D_iFwEfyryk3Azbpc_U2doknfXfkVW.pMwuNu4vjOd5 mfKsvk0KnDRJiNeWc4p1U7uQIrs9bj_lmsCFs2XU8sHBoekuhMAqttvKuAKxXdXaWU7KmSQI86ju 0Bh9Rx7x7uHEXVK6oFUpSQBQQHCy4GZM5nFw9eFuttCz_l_ySLRiSE1g6gMowPkGWBx6zsdNNqgk tNP0Jo0y9SB9Mx8HyCR62yyUsrhlz.O0AOvCY3WWgxVzfdEYsPsN6TD4iMSbVOXzLEZxLBYohYmt EhcteZ83a84DEJvRcHaP7gKlLXvwh8fQZ9XyIw1.qWBqK7uNf5LhLfNLFHyX502B_tLiiIOr6OfC mhwEyYl2T6g-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic310.consmr.mail.gq1.yahoo.com with HTTP; Mon, 7 Mar 2022 14:37:48 +0000 Received: by kubenode549.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID e531a35fae5a7c14a8e4be774e4c7a63; Mon, 07 Mar 2022 14:37:47 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28)) From: Mark Millard X-Priority: 3 (Normal) In-Reply-To: <132978150.92.1646660769467@mailrelay> Date: Mon, 7 Mar 2022 06:37:46 -0800 Cc: bob prohaska , Free BSD , freebsd-current Content-Transfer-Encoding: quoted-printable Message-Id: <10724FB9-8E75-4DB7-A0F4-CFF55D21272B@yahoo.com> References: <1800459695.1.1646649539521@mailrelay> <132978150.92.1646660769467@mailrelay> To: Ronald Klop , Mark Johnston X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Queue-Id: 4KC1Jz4RMGz3MvP X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=oAJxDSwm; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.69.147 as permitted sender) smtp.mailfrom=marklmi@yahoo.com X-Spamd-Result: default: False [-1.54 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; RCPT_COUNT_FIVE(0.00)[5]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; HAS_X_PRIO_THREE(0.00)[3]; NEURAL_HAM_SHORT(-0.73)[-0.727]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; NEURAL_SPAM_MEDIUM(0.69)[0.690]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.69.147:from]; MLMMJ_DEST(0.00)[freebsd-current]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.69.147:from]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N On 2022-Mar-7, at 05:46, Ronald Klop wrote: > Dear Mark Johnston, >=20 > I did some binary search in the kernels and came to the conclusion = that = https://cgit.freebsd.org/src/commit/?id=3D1517b8d5a7f58897200497811de1b188= 09c07d3e still works and = https://cgit.freebsd.org/src/commit/?id=3D407c34e735b5d17e2be574808a09e6d7= 29b0a45a panics. >=20 > I suspect your commit in = https://cgit.freebsd.org/src/commit/?id=3Dc84bb8cd771ce4bed58152e47a32dda4= 70bef23a. >=20 > Last panic: >=20 > panic: vm_fault failed: ffff00000046e708 error 1 > cpuid =3D 1 > time =3D 1646660058 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > vpanic() at vpanic+0x174 > panic() at panic+0x44 > data_abort() at data_abort+0x2e8 > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0x96000004 > _rm_rlock_debug() at _rm_rlock_debug+0x8c > osd_get() at osd_get+0x5c > zio_execute() at zio_execute+0xf8 > taskqueue_run_locked() at taskqueue_run_locked+0x178 > taskqueue_thread_loop() at taskqueue_thread_loop+0xc8 > fork_exit() at fork_exit+0x74 > fork_trampoline() at fork_trampoline+0x14 > KDB: enter: panic > [ thread pid 0 tid 100129 ] > Stopped at kdb_enter+0x44: undefined f902011f > db> Was this a WITNESS/DEBUG kernel? Non-WITNESS? Non-debug? Which aarch64 variant? Bob's was Cortex-A53 (RPi3). > A more recent kernel (912df91) still panics. See below. >=20 > Do you have time to look into this? What can I provide in information = to help? >=20 > Regards, > Ronald. >=20 > Van: Ronald Klop > Datum: maandag, 7 maart 2022 11:38 > Aan: Mark Millard > CC: bob prohaska , freebsd-current = , freebsd-arm@freebsd.org > Onderwerp: Re: panic: data abort in critical section or under mutex = (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on = 14-CURRENT/aarch64 Feb 28)) >=20 > Yes, I spoke to soon too. Often it crashes as soon as I start a = parallel poudriere build. But this time it went very far. As soon as = nightly backups kicked in it was game over again. > I had read the mail of Bob on the arm@ ML. But I wanted to let the = conclusion that it is about the same problem to the developers. (Have = seen enough of wrong guessing of causes in my work. ) >=20 > I will need to go further into the binary search of working kernels. >=20 > This was: FreeBSD 14.0-CURRENT #0 912df91: Wed Mar 2 00:36:35 UTC = 2022 > Fatal data abort: = =20 > x0: ffff000000f1efd8 x0: ffff000000f1efd8 (mac_policy_rm + 0) = (mac_policy_rm + 0) =20 > = =20 > x1: 2 x1: 2 = =20 > = =20 > x2: ffff00000087dcf2 x2: ffff00000087dcf2 (cam_status_table + = 2f28a) =20 > (cam_status_table + 2f28a) x3: ffff00000087dcf2 = =20 > x3: ffff00000087dcf2 (cam_status_table + 2f28a) (cam_status_table + = 2f28a) =20 > = =20 > x4: 102 x4: 102 = =20 > = =20 > x5: 7 x5: 1 = =20 > = =20 > x6: 0 x6: ff = =20 > = =20 > x7: 0 x7: ffffa00011fc2800 = =20 > x8: 1 = =20 > = =20 > x8: 1 x9: ffff000000f37c10 = =20 > x9: ffff0000419d9090 (pcpu0 + 90) (g_ctx + 40278fe4) = =20 > = =20 > x10: ffffa0017be2a600 x10: ffffa000010fa600 = =20 > x11: 394aed08d0003a48 = =20 > = =20 > x12: 350001a8b946a108 x11: 0 = =20 > = =20 > x12: ffff000000f37c10 x13: badecce4 (pcpu0 + 90) = =20 > = =20 > x13: ffffa0001fbde6b0 x14: 0 = =20 > = =20 > x14: 4965ae49 x15: 1 = =20 > = =20 > x15: 1000193 x16: ffff0000016a4238 = =20 > x16: ffff000100482d38 (__stop_set_modmetadata_set + d00) = (__stop_set_modmetadata_set + 448) =20= > = =20 > x17: ffff00000044a998 x17: ffff00000058ff30 (free + 0) = (if_inc_counter + 0) = =20 > = =20 > x18: ffff0000b49a23c0 x18: ffff000103f11b80 (g_ctx + b3242314) = =20 > (next_index + 3a228c0) x19: 102 = =20 >=20 > = =20 > x19: 102 x20: ffff0000b49a2428 = =20 > x20: ffff000103f11be8 (g_ctx + b324237c) (next_index + 3a22928) = =20 >=20 > x21: ffff00000087dcf2 x21: ffff00000087dcf2 (cam_status_table + = 2f28a) (cam_status_table + 2f28a) >=20 > x22: ffff000000f1efd8 x22: ffff000000f1efd8 (mac_policy_rm + 0) = (mac_policy_rm + 0) >=20 > x23: ffff00000086f107 x23: 0 (cam_status_table + = 2069f) >=20 > x24: ffffa0001fbde6c8 x24: ffffa0008cba0d00 > x25: 0 >=20 > x25: ffff00000088aa11 x26: 4 = (do_execve.fexecv_proc_title + 76b7) >=20 > x27: 0 x26: ffffa0017be2a600 > x28: ffff00010209fcf0 > x27: ffffa00025626a80 (next_index + 1bb0a30) >=20 > x28: ffff000103f11ce0 x29: ffff0000b49a23e0 (next_index + 3a22a20) = (g_ctx + b3242334) >=20 > x29: ffff000103f11ba0 sp: ffff0000b49a23c0 > (next_index + 3a228e0) lr: ffff00000046ef98 > sp: ffff000103f11b80 > (_rm_runlock_debug + 60) lr: ffff00000046ef98 > elr: ffff00000046dc0c (_rm_runlock_debug + 60) (_rm_assert + a4) >=20 > elr: ffff00000046dc0cspsr: 45 > (_rm_assert + a4) far: 10 >=20 > esr: 96000004 > spsr: 45 >=20 > panic: data abort in critical section or under mutex > cpuid =3D 1 > time =3D 1646609483 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > vpanic() at vpanic+0x174 > panic() at panic+0x44 > data_abort() at data_abort+0x2d4 > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0x96000004 > _rm_assert() at _rm_assert+0xa4 > _rm_runlock_debug() at _rm_runlock_debug+0x5c > mac_inpcb_check_deliver() at mac_inpcb_check_deliver+0x74 > tcp_input_with_port() at tcp_input_with_port+0xab4 > tcp_input() at tcp_input+0xc > ip_input() at ip_input+0x2e8 > netisr_dispatch_src() at netisr_dispatch_src+0xe4 > ether_demux() at ether_demux+0x178 > ether_nh_input() at ether_nh_input+0x3e8 > netisr_dispatch_src() at netisr_dispatch_src+0xe4 > ether_input() at ether_input+0x80 > if_input() at if_input+0xc > gen_intr() at gen_intr+0x444 > ithread_loop() at ithread_loop+0x2a0 > fork_exit() at fork_exit+0x74 > fork_trampoline() at fork_trampoline+0x14 > KDB: enter: panic > [ thread pid 12 tid 100063 ] > Stopped at kdb_enter+0x44: undefined f902011f > db> >=20 >=20 > NB: db> reboot/reset/halt does not work on my RPI4. Luckily I have a = wifi connected power switch on it. >=20 > Regards, > Ronald. >=20 > Van: Mark Millard > Datum: maandag, 7 maart 2022 02:01 > Aan: Ronald Klop > CC: freebsd-current , bob prohaska = > Onderwerp: Re: panic: data abort in critical section or under mutex = (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on = 14-CURRENT/aarch64 Feb 28)) >=20 > From: Ronald Klop wrote on > Date: Sun, 6 Mar 2022 23:22:42 +0100 (CET) : >=20 > > Did some binary search with kernels from artifact.ci.freebsd.org. > > > > I suspect "rmlock: Micro-optimize read locking" as cause. > > > > = https://cgit.freebsd.org/src/commit/?id=3Dc84bb8cd771ce4bed58152e47a32dda4= 70bef23a > > > > > > And "rmlock: Add required compiler barriers to _rm_runlock()" as = solution. > > > > = https://cgit.freebsd.org/src/commit/?id=3D89ae8eb74e87ac19aa2d7abe4ba16bcc= cd32bb9f > > > > > > So I probably just had a bad day. >=20 > Well, there is a report of a buildkernel crash after that pair: >=20 > https://lists.freebsd.org/archives/freebsd-arm/2022-March/001078.html >=20 > that references additional information at: >=20 > http://www.zefox.net/~fbsd/rpi3/crashes/20220304/readme >=20 > and reported: >=20 > QUOTE > The console connection dropped before the crash (unrelated) I didn't > get the preamble, all I have is the backtrace and buildkernel log. > Here's the backtrace: > db> bt > Tracing pid 14795 tid 100098 td 0xffffa00017815600 > db_trace_self() at db_trace_self > db_stack_trace() at db_stack_trace+0x11c > db_command() at db_command+0x368 > db_command_loop() at db_command_loop+0x54 > db_trap() at db_trap+0xf8 > kdb_trap() at kdb_trap+0x1cc > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0xf2000000 > kdb_enter() at kdb_enter+0x44 > vpanic() at vpanic+0x1b0 > panic() at panic+0x44 > data_abort() at data_abort+0x2e8 > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0x96000004 > _rm_rlock_debug() at _rm_rlock_debug+0x8c > sysctl_root_handler_locked() at sysctl_root_handler_locked+0x140 > sysctl_root() at sysctl_root+0x1ac > userland_sysctl() at userland_sysctl+0x140 > sys___sysctl() at sys___sysctl+0x68 > do_el0_sync() at do_el0_sync+0x520 > handle_el0_sync() at handle_el0_sync+0x40 > --- exception, esr 0x56000000 > END QUOTE This was a WITNESS and debug kernel as I understand. Also, this was a RPi3, so Cortex-A53, that has in-order-execution cores. (Unlike Cortex-A72's, for example). > The above material does reference _rm_rlock_debug . Might be > related? >=20 > The readme reports: >=20 > main-n253603-0b25cbc79d3: Thu Mar 3 22:48:31 PST 2022 >=20 > for the system doing the buildkernel. This is after > 89ae8eb74e8 . >=20 > (It also mentions another panic earlier in the week, > apparently not reported to the lists at the time.) >=20 So far as I have noticed, all reports of the crashes in _rm_rlock_debug are on aarch64 hardware. So may be the problem is tied to the weak memory model --but for something that matters to a Cortex-A53's executes-in-order cores? (Just athought.) But, then, the constrasting(?) status of powerpc64 might be of note. (And I'll stop guessing here.) I do not know if any non-WITNESS/non-debug kernel builds have failed. =3D=3D=3D Mark Millard marklmi at yahoo.com