Date: Sat, 24 Jun 2023 13:48:16 -0700 From: Mark Millard <marklmi@yahoo.com> To: Current FreeBSD <freebsd-current@freebsd.org>, freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit Message-ID: <2CACE963-7846-475D-B139-D11B551E4A3F@yahoo.com> In-Reply-To: <8E9937A8-1563-49C2-A1B1-150864C09AA0@yahoo.com> References: <3FD359F8-CFCC-400F-B6DE-B635B747DE7F@yahoo.com> <FAF014A1-88B5-4CAE-8A5C-2C2065528003@yahoo.com> <8E9937A8-1563-49C2-A1B1-150864C09AA0@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jun 24, 2023, at 12:16, Mark Millard <marklmi@yahoo.com> wrote: > On Jun 24, 2023, at 10:49, Mark Millard <marklmi@yahoo.com> wrote: >=20 >> On Jun 24, 2023, at 10:00, Mark Millard <marklmi@yahoo.com> wrote: >>=20 >>> The running system build is a non-debug build (but >>> with symbols not stripped). >>>=20 >>> The HoneyComb's console log shows: >>>=20 >>> . . . >>> GEOM_STRIPE: Device stripe.IMfBZr destroyed. >>> GEOM_NOP: Device md0.nop created. >>> g_vfs_done():md0.nop[READ(offset=3D5885952, length=3D8192)]error =3D = 5 >>> GEOM_NOP: Device md0.nop removed. >>> GEOM_NOP: Device md0.nop created. >>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D = 5 >>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D = 5 >>> GEOM_NOP: Device md0.nop removed. >>> GEOM_NOP: Device md0.nop created. >>> GEOM_NOP: Device md0.nop removed. >>> Fatal data abort: >>> x0: ffffa02506e64400 >>> x1: ffff0001ea401880 (g_raid3_post_sync + 3a145f8) >>> x2: 4b >>> x3: a343932b0b22fb30 >>> x4: 0 >>> x5: 3310b0d062d0e1d >>> x6: 1d0e2d060d0b3103 >>> x7: 0 >>> x8: ea325df8 >>> x9: ffff0001eec946d0 ($d.6 + 0) >>> x10: ffff0001ea401880 (g_raid3_post_sync + 3a145f8) >>> x11: 0 >>> x12: 0 >>> x13: ffff000000cd8960 (lock_class_mtx_sleep + 0) >>> x14: 0 >>> x15: ffffa02506e64405 >>> x16: ffff0001eec94860 (_DYNAMIC + 160) >>> x17: ffff00000063a450 (ifc_attach_cloner + 0) >>> x18: ffff0001eb290400 (g_raid3_post_sync + 48a3178) >>> x19: ffff0001eec94600 (vnet_epair_init_vnet_init + 0) >>> x20: ffff000000fa5b68 (vnet_sysinit_sxlock + 18) >>> x21: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>> x22: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>> x23: ffffa0000042e500 >>> x24: ffffa0000042e500 >>> x25: ffff000000ce0788 (linker_lookup_set_desc + 0) >>> x26: ffffa0203cdef780 >>> x27: ffff0001eec94698 (__set_sysinit_set_sym_if_epairmodule_sys_init = + 0) >>> x28: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>> x29: ffff0001eb290430 (g_raid3_post_sync + 48a31a8) >>> sp: ffff0001eb290400 >>> lr: ffff0001eec82a4c ($x.1 + 3c) >>> elr: ffff0001eec82a60 ($x.1 + 50) >>> spsr: 60000045 >>> far: ffff0002d8fba4c8 >>> esr: 96000046 >>> panic: vm_fault failed: ffff0001eec82a60 error 1 >>> cpuid =3D 14 >>> time =3D 1687625470 >>> KDB: stack backtrace: >>> db_trace_self() at db_trace_self >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >>> vpanic() at vpanic+0x13c >>> panic() at panic+0x44 >>> data_abort() at data_abort+0x2fc >>> handle_el1h_sync() at handle_el1h_sync+0x14 >>> --- exception, esr 0x96000046 >>> $x.1() at $x.1+0x50 >>> vnet_register_sysinit() at vnet_register_sysinit+0x114 >>> linker_load_module() at linker_load_module+0xae4 >>> kern_kldload() at kern_kldload+0xfc >>> sys_kldload() at sys_kldload+0x60 >>> do_el0_sync() at do_el0_sync+0x608 >>> handle_el0_sync() at handle_el0_sync+0x44 >>> --- exception, esr 0x56000000 >>> KDB: enter: panic >>> [ thread pid 70419 tid 101003 ] >>> Stopped at kdb_enter+0x44: str xzr, [x19, #3200] >>> db>=20 >>>=20 >>> I'll see if a re-run is repeatable. >>>=20 >>=20 >> It repeats: >>=20 >> GEOM_STRIPE: Device stripe/stripe.VkbPk1 deactivated. >> GEOM_STRIPE: Disk md1 removed from stripe.VkbPk1. >> GEOM_STRIPE: Disk md0 removed from stripe.VkbPk1. >> GEOM_STRIPE: Device stripe.VkbPk1 destroyed. >> GEOM_NOP: Device md0.nop created. >> g_vfs_done():md0.nop[READ(offset=3D5885952, length=3D8192)]error =3D = 5 >> GEOM_NOP: Device md0.nop removed. >> GEOM_NOP: Device md0.nop created. >> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D = 5 >> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D = 5 >> GEOM_NOP: Device md0.nop removed. >> GEOM_NOP: Device md0.nop created. >> GEOM_NOP: Device md0.nop removed. >> Fatal data abort: >> x0: ffffa0003b1a9500 >> x1: ffff00021b530260 >> x2: 4b >> x3: a343932b0b22fb30 >> x4: 0 >> x5: 3310b0d062d0e1d >> x6: 1d0e2d060d0b3103 >> x7: 0 >> x8: ea325df8 >> x9: ffff00021d6946d0 ($d.6 + 0) >> x10: ffff00021b530260 >> x11: 0 >> x12: 0 >> x13: ffff000000cd8960 (lock_class_mtx_sleep + 0) >> x14: 0 >> x15: ffffa0003b1a9505 >> x16: ffff00021d694860 (_DYNAMIC + 160) >> x17: ffff00000063a450 (ifc_attach_cloner + 0) >> x18: ffff00021a6ea400 >> x19: ffff00021d694600 (vnet_epair_init_vnet_init + 0) >> x20: ffff000000fa5b68 (vnet_sysinit_sxlock + 18) >> x21: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >> x22: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >> x23: ffffa00000431500 >> x24: ffffa00000431500 >> x25: ffff000000ce0788 (linker_lookup_set_desc + 0) >> x26: ffffa02e1ab6d180 >> x27: ffff00021d694698 (__set_sysinit_set_sym_if_epairmodule_sys_init = + 0) >> x28: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >> x29: ffff00021a6ea430 >> sp: ffff00021a6ea400 >> lr: ffff00021d682a4c ($x.1 + 3c) >> elr: ffff00021d682a60 ($x.1 + 50) >> spsr: 60000045 >> far: ffff0003079ba4c8 >> esr: 96000046 >> panic: vm_fault failed: ffff00021d682a60 error 1 >> cpuid =3D 1 >> time =3D 1687628622 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self >> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >> vpanic() at vpanic+0x13c >> panic() at panic+0x44 >> data_abort() at data_abort+0x2fc >> handle_el1h_sync() at handle_el1h_sync+0x14 >> --- exception, esr 0x96000046 >> $x.1() at $x.1+0x50 >> vnet_register_sysinit() at vnet_register_sysinit+0x114 >> linker_load_module() at linker_load_module+0xae4 >> kern_kldload() at kern_kldload+0xfc >> sys_kldload() at sys_kldload+0x60 >> do_el0_sync() at do_el0_sync+0x608 >> handle_el0_sync() at handle_el0_sync+0x44 >> --- exception, esr 0x56000000 >> KDB: enter: panic >> [ thread pid 36377 tid 100985 ] >> Stopped at kdb_enter+0x44: str xzr, [x19, #3200] >> db>=20 >>=20 >>=20 >> For reference, the output of the run in the ssh >> session ends with: >>=20 >> . . . >> sys/kqueue/libkqueue/kqueue_test:main -> passed [48.258s] >> sys/mac/bsdextended/ugidfw_test:main -> skipped: mac_bsdextended = not loaded [0.006s] >> sys/mac/portacl/nobody_test:main -> skipped: MAC_PORTACL is = unavailable. [0.010s] >> sys/mac/portacl/root_test:main -> skipped: MAC_PORTACL is = unavailable. [0.010s] >> sys/mqueue/mqueue_test:mqtest1 -> passed [0.025s] >> sys/mqueue/mqueue_test:mqtest2 -> passed [0.025s] >> sys/mqueue/mqueue_test:mqtest5 -> passed [0.025s] >> sys/net/if_ovpn/if_ovpn_c:tcp -> skipped: if_ovpn not loaded = [0.006s] >> sys/netinet/arp:arp_add_success -> =20 >>=20 >> That should give some extra information about the context >> of failure. >=20 > So I installed, booted, and tried my debug build. It failed > the same way in the same place, with no extra console > reporting for the crash by the debug code: no assertion > failures or WITNESS reports or the like first. I tried doing just: # kyua test -k /usr/tests/Kyuafile sys/netinet/arp and it crashed the same way at the same place. The prior kyua activity in other tests does not need to be involved to get the crash. For now I've touched the /usr/tests/sys/netinet/Kyuafile to comment out the arp test line. There does not seem to be a supported way to indicate to skip just one or a few specific tests. So I'm touching a do-not-touch file instead. We will see how far it gets when skipping sys/netinet/arp . =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2CACE963-7846-475D-B139-D11B551E4A3F>