Date: Sat, 24 Jun 2023 14:27:08 -0700 From: Mark Millard <marklmi@yahoo.com> To: Current FreeBSD <freebsd-current@freebsd.org>, freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit Message-ID: <71F8A824-24DE-46F0-B41C-94AACD4E1D6A@yahoo.com> In-Reply-To: <2CACE963-7846-475D-B139-D11B551E4A3F@yahoo.com> References: <3FD359F8-CFCC-400F-B6DE-B635B747DE7F@yahoo.com> <FAF014A1-88B5-4CAE-8A5C-2C2065528003@yahoo.com> <8E9937A8-1563-49C2-A1B1-150864C09AA0@yahoo.com> <2CACE963-7846-475D-B139-D11B551E4A3F@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jun 24, 2023, at 13:48, Mark Millard <marklmi@yahoo.com> wrote: > On Jun 24, 2023, at 12:16, Mark Millard <marklmi@yahoo.com> wrote: >=20 >> On Jun 24, 2023, at 10:49, Mark Millard <marklmi@yahoo.com> wrote: >>=20 >>> On Jun 24, 2023, at 10:00, Mark Millard <marklmi@yahoo.com> wrote: >>>=20 >>>> The running system build is a non-debug build (but >>>> with symbols not stripped). >>>>=20 >>>> The HoneyComb's console log shows: >>>>=20 >>>> . . . >>>> GEOM_STRIPE: Device stripe.IMfBZr destroyed. >>>> GEOM_NOP: Device md0.nop created. >>>> g_vfs_done():md0.nop[READ(offset=3D5885952, length=3D8192)]error =3D = 5 >>>> GEOM_NOP: Device md0.nop removed. >>>> GEOM_NOP: Device md0.nop created. >>>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D = 5 >>>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D = 5 >>>> GEOM_NOP: Device md0.nop removed. >>>> GEOM_NOP: Device md0.nop created. >>>> GEOM_NOP: Device md0.nop removed. >>>> Fatal data abort: >>>> x0: ffffa02506e64400 >>>> x1: ffff0001ea401880 (g_raid3_post_sync + 3a145f8) >>>> x2: 4b >>>> x3: a343932b0b22fb30 >>>> x4: 0 >>>> x5: 3310b0d062d0e1d >>>> x6: 1d0e2d060d0b3103 >>>> x7: 0 >>>> x8: ea325df8 >>>> x9: ffff0001eec946d0 ($d.6 + 0) >>>> x10: ffff0001ea401880 (g_raid3_post_sync + 3a145f8) >>>> x11: 0 >>>> x12: 0 >>>> x13: ffff000000cd8960 (lock_class_mtx_sleep + 0) >>>> x14: 0 >>>> x15: ffffa02506e64405 >>>> x16: ffff0001eec94860 (_DYNAMIC + 160) >>>> x17: ffff00000063a450 (ifc_attach_cloner + 0) >>>> x18: ffff0001eb290400 (g_raid3_post_sync + 48a3178) >>>> x19: ffff0001eec94600 (vnet_epair_init_vnet_init + 0) >>>> x20: ffff000000fa5b68 (vnet_sysinit_sxlock + 18) >>>> x21: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>> x22: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>> x23: ffffa0000042e500 >>>> x24: ffffa0000042e500 >>>> x25: ffff000000ce0788 (linker_lookup_set_desc + 0) >>>> x26: ffffa0203cdef780 >>>> x27: ffff0001eec94698 = (__set_sysinit_set_sym_if_epairmodule_sys_init + 0) >>>> x28: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>> x29: ffff0001eb290430 (g_raid3_post_sync + 48a31a8) >>>> sp: ffff0001eb290400 >>>> lr: ffff0001eec82a4c ($x.1 + 3c) >>>> elr: ffff0001eec82a60 ($x.1 + 50) >>>> spsr: 60000045 >>>> far: ffff0002d8fba4c8 >>>> esr: 96000046 >>>> panic: vm_fault failed: ffff0001eec82a60 error 1 >>>> cpuid =3D 14 >>>> time =3D 1687625470 >>>> KDB: stack backtrace: >>>> db_trace_self() at db_trace_self >>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >>>> vpanic() at vpanic+0x13c >>>> panic() at panic+0x44 >>>> data_abort() at data_abort+0x2fc >>>> handle_el1h_sync() at handle_el1h_sync+0x14 >>>> --- exception, esr 0x96000046 >>>> $x.1() at $x.1+0x50 >>>> vnet_register_sysinit() at vnet_register_sysinit+0x114 >>>> linker_load_module() at linker_load_module+0xae4 >>>> kern_kldload() at kern_kldload+0xfc >>>> sys_kldload() at sys_kldload+0x60 >>>> do_el0_sync() at do_el0_sync+0x608 >>>> handle_el0_sync() at handle_el0_sync+0x44 >>>> --- exception, esr 0x56000000 >>>> KDB: enter: panic >>>> [ thread pid 70419 tid 101003 ] >>>> Stopped at kdb_enter+0x44: str xzr, [x19, #3200] >>>> db>=20 >>>>=20 >>>> I'll see if a re-run is repeatable. >>>>=20 >>>=20 >>> It repeats: >>>=20 >>> GEOM_STRIPE: Device stripe/stripe.VkbPk1 deactivated. >>> GEOM_STRIPE: Disk md1 removed from stripe.VkbPk1. >>> GEOM_STRIPE: Disk md0 removed from stripe.VkbPk1. >>> GEOM_STRIPE: Device stripe.VkbPk1 destroyed. >>> GEOM_NOP: Device md0.nop created. >>> g_vfs_done():md0.nop[READ(offset=3D5885952, length=3D8192)]error =3D = 5 >>> GEOM_NOP: Device md0.nop removed. >>> GEOM_NOP: Device md0.nop created. >>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D = 5 >>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D = 5 >>> GEOM_NOP: Device md0.nop removed. >>> GEOM_NOP: Device md0.nop created. >>> GEOM_NOP: Device md0.nop removed. >>> Fatal data abort: >>> x0: ffffa0003b1a9500 >>> x1: ffff00021b530260 >>> x2: 4b >>> x3: a343932b0b22fb30 >>> x4: 0 >>> x5: 3310b0d062d0e1d >>> x6: 1d0e2d060d0b3103 >>> x7: 0 >>> x8: ea325df8 >>> x9: ffff00021d6946d0 ($d.6 + 0) >>> x10: ffff00021b530260 >>> x11: 0 >>> x12: 0 >>> x13: ffff000000cd8960 (lock_class_mtx_sleep + 0) >>> x14: 0 >>> x15: ffffa0003b1a9505 >>> x16: ffff00021d694860 (_DYNAMIC + 160) >>> x17: ffff00000063a450 (ifc_attach_cloner + 0) >>> x18: ffff00021a6ea400 >>> x19: ffff00021d694600 (vnet_epair_init_vnet_init + 0) >>> x20: ffff000000fa5b68 (vnet_sysinit_sxlock + 18) >>> x21: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>> x22: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>> x23: ffffa00000431500 >>> x24: ffffa00000431500 >>> x25: ffff000000ce0788 (linker_lookup_set_desc + 0) >>> x26: ffffa02e1ab6d180 >>> x27: ffff00021d694698 (__set_sysinit_set_sym_if_epairmodule_sys_init = + 0) >>> x28: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>> x29: ffff00021a6ea430 >>> sp: ffff00021a6ea400 >>> lr: ffff00021d682a4c ($x.1 + 3c) >>> elr: ffff00021d682a60 ($x.1 + 50) >>> spsr: 60000045 >>> far: ffff0003079ba4c8 >>> esr: 96000046 >>> panic: vm_fault failed: ffff00021d682a60 error 1 >>> cpuid =3D 1 >>> time =3D 1687628622 >>> KDB: stack backtrace: >>> db_trace_self() at db_trace_self >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >>> vpanic() at vpanic+0x13c >>> panic() at panic+0x44 >>> data_abort() at data_abort+0x2fc >>> handle_el1h_sync() at handle_el1h_sync+0x14 >>> --- exception, esr 0x96000046 >>> $x.1() at $x.1+0x50 >>> vnet_register_sysinit() at vnet_register_sysinit+0x114 >>> linker_load_module() at linker_load_module+0xae4 >>> kern_kldload() at kern_kldload+0xfc >>> sys_kldload() at sys_kldload+0x60 >>> do_el0_sync() at do_el0_sync+0x608 >>> handle_el0_sync() at handle_el0_sync+0x44 >>> --- exception, esr 0x56000000 >>> KDB: enter: panic >>> [ thread pid 36377 tid 100985 ] >>> Stopped at kdb_enter+0x44: str xzr, [x19, #3200] >>> db>=20 >>>=20 >>>=20 >>> For reference, the output of the run in the ssh >>> session ends with: >>>=20 >>> . . . >>> sys/kqueue/libkqueue/kqueue_test:main -> passed [48.258s] >>> sys/mac/bsdextended/ugidfw_test:main -> skipped: mac_bsdextended = not loaded [0.006s] >>> sys/mac/portacl/nobody_test:main -> skipped: MAC_PORTACL is = unavailable. [0.010s] >>> sys/mac/portacl/root_test:main -> skipped: MAC_PORTACL is = unavailable. [0.010s] >>> sys/mqueue/mqueue_test:mqtest1 -> passed [0.025s] >>> sys/mqueue/mqueue_test:mqtest2 -> passed [0.025s] >>> sys/mqueue/mqueue_test:mqtest5 -> passed [0.025s] >>> sys/net/if_ovpn/if_ovpn_c:tcp -> skipped: if_ovpn not loaded = [0.006s] >>> sys/netinet/arp:arp_add_success -> =20 >>>=20 >>> That should give some extra information about the context >>> of failure. >>=20 >> So I installed, booted, and tried my debug build. It failed >> the same way in the same place, with no extra console >> reporting for the crash by the debug code: no assertion >> failures or WITNESS reports or the like first. >=20 > I tried doing just: >=20 > # kyua test -k /usr/tests/Kyuafile sys/netinet/arp >=20 > and it crashed the same way at the same place. The prior > kyua activity in other tests does not need to be involved > to get the crash. >=20 > For now I've touched the /usr/tests/sys/netinet/Kyuafile > to comment out the arp test line. There does not seem to > be a supported way to indicate to skip just one or a few > specific tests. So I'm touching a do-not-touch file > instead. >=20 > We will see how far it gets when skipping sys/netinet/arp . Same sort of crash: sys/netinet/fibs:fibs_ifroutes1_success -> =20 got: GEOM_NOP: Device md0.nop created. g_vfs_done():md0.nop[READ(offset=3D5885952, length=3D8192)]error =3D 5 GEOM_NOP: Device md0.nop removed. GEOM_NOP: Device md0.nop created. g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D 5 g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D 5 GEOM_NOP: Device md0.nop removed. GEOM_NOP: Device md0.nop created. GEOM_NOP: Device md0.nop removed. Fatal data abort: x0: ffffa025d2340300 x1: ffff0001daf54000 x2: ffff00000096349b (cam_status_table + 1cf4b) x3: 113 x4: 0 x5: 3a1d1c0900301419 x6: 19143000091c1d3a x7: 656874655f676e00 x8: 3f8b6760 x9: ffff0002224a5590 ($d.6 + 0) x10: 0 x11: ffff000000f41d48 (w_locklistdata + 2ca00) x12: 3 x13: 2 x14: 10000 x15: 1 x16: 10000 x17: ffff0000005ba750 (ifc_attach_cloner + 0) x18: ffff0001d8fd73e0 x19: ffff0002224a54c0 (vnet_epair_init_vnet_init + 0) x20: ffffa000003ea600 x21: ffffa000003ea600 x22: 5e4a6f28 x23: ffffa0001c510f00 x24: ffff000000a08fba (notify.prefix + 18b88) x25: ffff000000c53908 (linker_lookup_set_desc + 0) x26: ffffa02df9378c00 x27: ffff000000d02000 (sdta_vfs_vop_vop_spare1_entry0 + 18) x28: ffff0001d8fd7788 x29: ffff0001d8fd7410 sp: ffff0001d8fd73e0 lr: ffff000222493b8c ($x.1 + 3c) elr: ffff000222493ba0 ($x.1 + 50) spsr: 20000045 far: ffff000261d5bcf0 (next_index + 3f8b5c10) esr: 96000046 panic: vm_fault failed: ffff000222493ba0 error 1 cpuid =3D 15 time =3D 1687640644 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x13c panic() at panic+0x44 data_abort() at data_abort+0x354 handle_el1h_sync() at handle_el1h_sync+0x14 --- exception, esr 0x96000046 $x.1() at $x.1+0x50 vnet_register_sysinit() at vnet_register_sysinit+0x114 linker_load_module() at linker_load_module+0xa9c kern_kldload() at kern_kldload+0x18c sys_kldload() at sys_kldload+0x60 do_el0_sync() at do_el0_sync+0x590 handle_el0_sync() at handle_el0_sync+0x44 --- exception, esr 0x56000000 KDB: enter: panic [ thread pid 37224 tid 100647 ] Stopped at kdb_enter+0x44: str xzr, [x19, #3328] db>=20 Disabling fibs and trying gets the same type of crash for: lpm:lpm_test1_success -> =20 (I'll not quote the backtrace related text.) Disabling lpm in turn allowed sys/netinet/ to finish without a system crash. So I now have arp, fibs, and lpm commented out in /usr/tests/sys/netinet/Kyuafile : -- atf_test_program{name=3D"arp", } . . . -- atf_test_program{name=3D"fibs", } . . . -- atf_test_program{name=3D"lpm", } We will see how far after sys/netinet/ /usr/tests/Kyuafile gets with that in place. =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?71F8A824-24DE-46F0-B41C-94AACD4E1D6A>