Date: Sat, 22 Nov 2025 23:00:59 +0100 From: Michael Tuexen <tuexen@FreeBSD.org> To: John Baldwin <jhb@freebsd.org> Cc: Andrew Turner <andrew@FreeBSD.org>, "src-committers@freebsd.org" <src-committers@FreeBSD.org>, "dev-commits-src-all@freebsd.org" <dev-commits-src-all@FreeBSD.org>, "dev-commits-src-main@freebsd.org" <dev-commits-src-main@FreeBSD.org> Subject: Re: git: a695ac2ce8bc - main - arm64: Move intr_pic_init_secondary earlier Message-ID: <2C058EE8-72FF-40AB-AC7A-5E5C1A0EEC01@FreeBSD.org> In-Reply-To: <a7373f1c-1369-46c8-ab29-176bf3551942@FreeBSD.org> References: <691cb4c1.220bb.22f9ecf6@gitrepo.freebsd.org> <a7373f1c-1369-46c8-ab29-176bf3551942@FreeBSD.org>
index | next in thread | previous in thread | raw e-mail
> On 22. Nov 2025, at 17:47, John Baldwin <jhb@freebsd.org> wrote: > > On 11/18/25 13:02, Andrew Turner wrote: >> The branch main has been updated by andrew: >> URL: https://cgit.FreeBSD.org/src/commit/?id=a695ac2ce8bc8e8b989359002659063f2e056dcf >> commit a695ac2ce8bc8e8b989359002659063f2e056dcf >> Author: Andrew Turner <andrew@FreeBSD.org> >> AuthorDate: 2025-11-18 18:00:32 +0000 >> Commit: Andrew Turner <andrew@FreeBSD.org> >> CommitDate: 2025-11-18 18:00:32 +0000 >> arm64: Move intr_pic_init_secondary earlier >> This may have been called after intr_irq_shuffle. For most interrupt >> controllers this appears to be safe, however for the GICv5 we need to >> read a per-CPU ID register before we can assign interrupts to a given >> CPU. >> Fix the race by moving intr_pic_init_secondary earlier in the boot, >> after devices have been enumerated and before the interrupts are moved >> to their assigned CPUs. >> Sponsored by: Arm Ltd >> Differential Revision: https://reviews.freebsd.org/D53685 > > This reliably panics on boot on an Ampere Altra system I have access to. I think this also affects FreeBSD under VMWare Fusion or VirtualBox in arm-based Macs. Booting in safe mode always worked. Using QEMU did not result in any problem. Best regards Michael > Unfortunately the panic isn't very helpful as multiple CPUs panic at once > cluttering the console and there appear to be secondary panics in the > console code that obscure whatever the original panic is. A few sample > crashes below: > > pci24: <PCI bus> numa-domain 0 on pcib24 > cpu0: <ACPI CPU> on acpi0 > armv8crypto0: <AES-CBC,AES-XTS,AES-GCM> > Fa t xal d x0: 0xffff0000:<E5><FF> 0 x0FFF Fpaatnailc :d ata abormt:t > x_ Asser0ion p->tp_row < t->t_winsize.tp_row failed at /usr/src/sys/teke dttken > .cr103 > puid = -65536 > time = 1 > KDB: stack backtrace: > db_trace_self() at db_trace_self > KDB: enter: panic > panic: kdb_backend_permitted: missing cred for 0xffff0000455b21a0 > cpuid = -65536 > time = 1 > ... > > pcib24: <PCI-PCI bridge> at device 7.0 numa-domain 0 on pci20 > pci24: <PCI bus> numa-domain 0 on pcib24 > cpu0: <ACPI CPU> on acpi0 > armv8crypto0: <AES-CBC,AES-XTS,AES-GCM> > Fatal data abo rxt0:: 0xFfaftf lFaFF x0: 0x0000000096000004 > x1: 0xffff0000454c3640 (crypto_dev + 0x43a95f80) > x2: 0x0000000096000004 > x3: 0x0000000096000504 > x4: 0xffff0000454c3590 (crypto_dev + 0x43a95ed0) > x5: 0xffff00000088881c (handle_el1h_sync + 0x1c) > x6: 0x0000000000000000 > x7: 0xffff00000088881c (handle_el1h_sync + 0x1c) > x8: 0x00000000f0c1a000 > x9: 0x0000000000000620 > x10: 0x000000000x0:pa00 > x11: 0x000 000000000500 > x12: 0x0000000096000004 > x13: 0xffff0000454c36e0 (crypto_dev + 0x43a96020) > x14: 0xffff0000454c3610 (crypto_dev + 0x43a95f50) > x15: 0xffff00000088881c (handle_el1h_sync + 0x1c) > x16: 0xffff0000008b59e4 (data_abort + 0x158) > x17: 0x00000000804000c9 > x18: 0xffff00004553a000 (crypto_dev + 0x43b0c940) > x19: 0xffff0000454c3640 (crypto_dev + 0x43a95f80) > x20: 0x0000000096000004 > x21: 0x0000000096000504 > x22: 0x0000000096000004 > x23: 0x0000000000000620 > x24: 0x00000000f0c1a000 > x25: 0x0000000000000000 > x26: 0xffff000000000000 > x27: 0xffff000000a318d6 (notify.prefix + 0x3e2a5) > x28: 0xffff000000a02aa1 (notify.prefix + 0xf470) > x29: 0xffff000000b77488 (abort_handlers + 0x0) > sp: 0xffff0000454c3570 > lr: 0xffff00000088881c (handle_el1h_sync + 0x1c) > elr: 0xffff0000008b59e4 (data_abort + 0x158) > spsr: 0x00000000804000c9 > far: 0x0000000096000504 > esr: 0x0000000096000004 > panic: data abort with spinlock held (spinlock count 356126888 != 0) > cpuid = 0 > time = 1 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x38 > vpanic() at vpanic+0x1d0 > panic() at panic+0x48 > data_abort() at data_abort+0x3a0 > handle_el1h_sync() at handle_el1h_sync+0x18 > --- exception, esr 0x96000004 > data_abort() at data_abort+0x158 > (null)() at -0x4 > WARNING: D-cacheline size mismatch 64 != 1024 > WARNING: I-cacheline size mismatch 64 != 16384 > WARNING: D-cacheline size mismatch 64 != 8192 > WARNING: D-cacheline size mismatch 64 != 8 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: I-cacheline size mismatch 64 != 2048 > WARNING: D-cacheline size mismatch 64 != 2048 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 1024 > WARNING: I-cacheline size mismatch 64 != 16384 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 8192 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: I-cacheline size mismatch 64 != 128 > WARNING: D-cacheline size mismatch 64 != 2048 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 512 > WARNING: I-cacheline size mismatch 64 != 1024 > WARNING: D-cacheline size mismatch 64 != 2048 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 2048 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 8 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: I-cacheline size mismatch 64 != 2048 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 1024 > WARNING: I-cacheline size mismatch 64 != 16384 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 8192 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 2048 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 1024 > WARNING: I-cacheline size mismatch 64 != 16384 > WARNING: D-cacheline size mismatch 64 != 4 > WARNING: I-cacheline size mismatch 64 != 4 > WARNING: D-cacheline size mismatch 64 != 8192 > WARNING: I-cacheline size mismatch 64 != 4 > Fatal data abort: > x0: 0x0000000096000504 > x1: 0xffff0000015e7ef6 ($d + 0x46) > x2: 0x00000000000000df > x3: 0x0000000000000074 > x4: 0x0000000000000000 > x5: 0x020f352e0d060319 > x6: 0x0000000000000004 > x7: 0x656e6f7a5f716b73 > x8: 0x0101010101010101 > x9: 0x0000000000000003 > x10: 0xfffeffff6b5e79f2 > x11: 0x0000000000000001 > x12: 0x0000000000000000 > x13: 0x0000000000000017 > x14: 0x0000080080000000 > x15: 0xffff000000b73548 (mvfr1_fields + 0x0) > x16: 0xffff0000018edd30 (__stop_set_modmetadata_set + 0xf00) > x17: 0xffff000000831d3c (uma_zcreate + 0x0) > x18: 0xffff0000011bc900 (pcpu0 + 0x0) > x19: 0xffff000116200200 > x20: 0xffff000000e5b9c8 (initstack + 0x39c8) > x21: 0xffff0000015e7ef6 ($d + 0x46) > x22: 0xffff0000404cd200 (crypto_dev + 0x3ea9fb40) > x23: 0x0000000000000000 > x24: 0xffff00004548b000 (crypto_dev + 0x43a5d940) > x25: 0xffff0000018b7128 (system_taskq_init_sys_init + 0x0) > x26: 0xffff0000010bd478 (mp_ncpus + 0x0) > x27: 0x0000000003800000 > x28: 0xffff00000103b000 (g_bio_run_down + 0x30) > x29: 0xffff000000e5b8b0 (initstack + 0x38b0) > sp: 0xffff000000e5b880 > lr: 0xffff0000008313e0 (zone_ctor + 0xd8) > elr: 0xffff0000008b38f8 (strcmp + 0x98) > spsr: 0x0000000000400009 > far: 0x0000000096000504 > esr: 0x0000000096000004 > panic: vm_fault failed: 0xffff0000008b38f8 error 1 > cpuid = 0 > time = 1 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x38 > vpanic() at vpanic+0x1d0 > panic() at panic+0x48 > data_abort() at data_abort+0x28c > handle_el1h_sync() at handle_el1h_sync+0x18 > --- exception, esr 0x96000004 > strcmp() at strcmp+0x98 > item_ctor() at item_ctor+0x218 > zone_alloc_item() at zone_alloc_item+0x140 > uma_zcreate() at uma_zcreate+0xa4 > system_taskq_init() at system_taskq_init+0x10c > mi_startup() at mi_startup+0x1f4 > virtdone() at virtdone+0x74 > KDB: enter: panic > KDBh rcdntnrc0i^Mtpa1ick0spi > Siot ece_ opanic ankc:tmtt_ckanic:cpamppa :p sc p tal cp ix: i a 0:nixpfpfnm0npan2c5a0nippp: (ntc: npteap p nia: mnxap1:i0:0m00anpa00p0p_pon > p p2p 0papfn:0n000p0nab: mppppanip_nic:_mtx.loc p nx1: 3p)_ > o k_s an: 0ec0ra0dpapmpic > ppan::pxpac0 00x_lo0k_nic > pax5n pxp0npa0 00x_lo050^Mpinx6p 0xpmffc:0mt0_lb128spnpa (nvnr anapa mp4p0n > mt:alxp0pic: m3a_0p04p > pap8_ capani0mp0n1c:amt0apppantnr ad0_lp p ni0np > anp9:mpac:p0000: nix0pap > cp0:pcnpcp pan0c: m00pan > an1np0nip0np pa00pp0na0:cp p2: nxtp00tpan00: 0tap > xnic pxpp000c00mt0p0o0fpppxc4c m0xi000c: 0p0p0nca > sapa: 0x00 ap00i0:0map00^Mpix: anx_lpfficp0ni3:cmd0pp antc:rmnan: :impu_pa 0nic: > :a1ac:pa0m0:p0nip: mtx_0a > icni:a anfnp0:p00p1bpmp0_lpippamic:pmta_l)ck s19pi0x00npc: m0n0c:ppan > nxpa:nicf fpx0000k_np3nicpipa(icapapsnpr:_can_leck+ac:0) > tx2:p picpf:00p0pan1c00mtx_lock_spinv rnvnpnpoptxpace4apepa ptxplp3ni^M:pnp2n c:fana0ic:0mtx_0ppaip:nppapreanipsc: 0apanp > xc:capxc: mpip0n0n:am0xpppipanicpictpan_parnpbpo trpapptpbpa_scapap ninicp mp24pacic:fn0a00c1 mlxcpanic:pmcx_loci_bp+ 0plo)^Minpp5p nxcf ft00lopa6can0pppanp : o_expcpenfep pvapaocnii:apap c:net9anp > px26pa0apap000cp0panpanic > nxn7: 0ni0:0p0nic:i0n m0 > panic atxfp00ic: m5a_3ock_sppn: itauacad+c:pap38)^Mnpx29::amtpap000appnnap0nicpipanic:tnixcp ni0:a70pa^Mipani p0xfffcp0n0c0 op7nin > pplp:n0p:9max_9ocka5pan1 > pelp: ox49c2dm9xbcapan:1^Mpspxrp nipa0apan0c0 a0xcla > f:i: x4pandc:abca5anic > mapini inknicpakaraplpa ctxtinnpac2 mtxpana1icxnan: 0t0 > ock_d pax > panic= 1t > panic= 1t > pDn:nipankcb mtxp:cenp > : nppb_nraca_spapataaic:npapbnipaci_mexp^Mpicppp_prac _nilf mraploc(_ atn:precnicncc_ mtf_lrpp_icipxni > pmtp_ppanic(ppnncppptp_lini +0xn:p > iapanpcn ct)_lpapppmppnicn mpanix_pppppappan1p_apnp(p c:nppppppx_nil1 mtxnl+cxpsp^Mnpcnicp papln:el papyncc: acx_ppcpanacdlp_eic:_spnpanip8pap-cpanceapanic estx_loaka0pc0n > pipapac: )tananipppp_anacppn92nic: 0t1p > picic:ppriic:(mtxtpockpspvnrcn apa_apa > ia:ipecp1n > papppppppppppppaanic:mmtx_occk_spin: recursed o nnnreecursivemmutex trmlck @ /usrssrc/sys/krrn/sbbr_temmina.c::605 > > ccpuid = 05 > ti^Me i 1 > KDB: DB:ck bakkbrace: > e: > db_tract_sel_() atb_trdb_tracf_self > e: db_trace_self_wrapper() at db_trace_self_wrapper+0x38 > vpanic() at vpanic+0x1d0 > panic() at panic+0x48 > vpanic() at vpanic+0x1d0 > panic() at panic+0x48 > __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188 > termcn_cnputc() at termcn_cnputc+0x2c > cnputc() at cnputc+0xa0 > kvprintf() at kvprintf+0xa4 > _vprintf() at _vprintf+0x78 > printf() at printf+0x58 > vpanic() at vpanic+0x26c > panic() at panic+0x48 > __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188 > termcn_cnput<FF>NOTICE: DRAM FW version 211207 > ... > > > pci24: <PCI bus> numa-domain 0 on pcib24 > cpu0: <ACPI CPU> on acpi0 > armv8crypto0: <AES-CBC,AES-XTS,AES-GCM> > <C9>p a nxip cx:0aF n:a it0caxl:f F<DF> panic: stack overflow detected; back > trace may be corrupted > cpuid = 0 > time = 1 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x38 > vpanic() at vpanic+0x1d0 > panic() at panic+0x48 > __stack_chk_fail() at __stack_chk_fail+0x14 > msgbuf_addst (x0: 0x00000s0b<E0> x0dd0xr00004000000000ul > x1: 0x0x > KDB: enter: panic > KDB: KeKK KKKKlKKKnKKK KKKKKpKKaKKKKKcKKpKKaKKKKKKKKKKKKKKKpKKKKKKppKKKKpKKKpKK > KKKKKKKKKKKKKKKpKKKKpKKKKKKKpKKKKKKpKKpKpKKpKKKpKpKpppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppanic: mtx_lock_spin: eecrrsed on non-recursive mutxx trmlkk @ (null)x//yys/kern/uubrtterminl..c:60 > > > cpuid =-65 > tim > = 1 > KDB: sB:ck aacktaacer > e: > db_trace_aelf() adb_trdb_trace^Mself > e: ds_trace_self_wrapper() at db_trace_self_wrapper+0x38 > vpanic() at vpanic+0x1d0 > panic() at panic+0x48 > __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188 > termcn_cnputc() at termcn_cnputc+0x2c > cnputc() at cnputc+0xa0 > kvprintf() at kvprintf+0xa4 > _vprintf() at _vprintf+0x78 > printf() at printf+0x58 > vpanic() at vpanic+0x26c > panic() at panic+0x48 > __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188 > termcn_cnputc() at termcn_cnputc+0x2c > cnputc() at cnputc+0xa0 > kvprintf() at kvprintf+0xa4 > _vprintf() at _vprintf+0x78 > printf() at printf+0x58 > vpanic() at vpanic+0x26c > panic() at panic+0x48 > __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188 > termcn_cnputc() at termcn_cnputc+0x2c > cnputc() at cnputc+0xa0 > kvprintf() at kvprintf+0xa4 > _vprintf() at _vprintf+0x78 > printf() at printf+0x58 > vpanic() at vpanic+0x26c > panic() at panic+0x48 > __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188 > termcn_cnputc() at termcn_cnputc+0x2c > cnputc() at cnputc+0xa0 > kvprintf() at kvprintf+0xa4 > _vprintf() at _vprintf+0x78 > printf() at printf+0x58 > vpanic() at vpanic+0x26c > panic() at panic+0x48 > __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188 > termcn_cnputc() at termcn_cnputc+0x2c > cnputc() at cnputc+0xaNOTICE: DRAM FW version 211207 > > I do see gic0 attached in dmesg before each of the crashes. > > Hmm, this tries to use spin locks in the gic driver before curthread is > set and that's probably not going to work. > > Indeed, the fix below lets my box boot again: > > diff --git a/sys/arm64/arm64/mp_machdep.c b/sys/arm64/arm64/mp_machdep.c > index ba673ce9d6ee..5fd5197b6818 100644 > --- a/sys/arm64/arm64/mp_machdep.c > +++ b/sys/arm64/arm64/mp_machdep.c > @@ -270,6 +270,10 @@ init_secondary(uint64_t cpu) > install_cpu_errata(); > enable_cpu_feat(CPU_FEAT_AFTER_DEV); > + /* Initialize curthread */ > + KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread")); > + pcpup->pc_curthread = pcpup->pc_idlethread; > + > intr_pic_init_secondary(); > /* Signal we are done */ > @@ -279,9 +283,6 @@ init_secondary(uint64_t cpu) > while (!atomic_load_int(&aps_ready)) > __asm __volatile("wfe"); > - /* Initialize curthread */ > - KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread")); > - pcpup->pc_curthread = pcpup->pc_idlethread; > schedinit_ap(); > /* Initialize curpmap to match TTBR0's current setting. */ > > -- > John Baldwin > >help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2C058EE8-72FF-40AB-AC7A-5E5C1A0EEC01>
