Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 24 Jun 2023 13:48:16 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Current FreeBSD <freebsd-current@freebsd.org>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit
Message-ID:  <2CACE963-7846-475D-B139-D11B551E4A3F@yahoo.com>
In-Reply-To: <8E9937A8-1563-49C2-A1B1-150864C09AA0@yahoo.com>
References:  <3FD359F8-CFCC-400F-B6DE-B635B747DE7F@yahoo.com> <FAF014A1-88B5-4CAE-8A5C-2C2065528003@yahoo.com> <8E9937A8-1563-49C2-A1B1-150864C09AA0@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Jun 24, 2023, at 12:16, Mark Millard <marklmi@yahoo.com> wrote:

> On Jun 24, 2023, at 10:49, Mark Millard <marklmi@yahoo.com> wrote:
>=20
>> On Jun 24, 2023, at 10:00, Mark Millard <marklmi@yahoo.com> wrote:
>>=20
>>> The running system build is a non-debug build (but
>>> with symbols not stripped).
>>>=20
>>> The HoneyComb's console log shows:
>>>=20
>>> . . .
>>> GEOM_STRIPE: Device stripe.IMfBZr destroyed.
>>> GEOM_NOP: Device md0.nop created.
>>> g_vfs_done():md0.nop[READ(offset=3D5885952, length=3D8192)]error =3D =
5
>>> GEOM_NOP: Device md0.nop removed.
>>> GEOM_NOP: Device md0.nop created.
>>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D =
5
>>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D =
5
>>> GEOM_NOP: Device md0.nop removed.
>>> GEOM_NOP: Device md0.nop created.
>>> GEOM_NOP: Device md0.nop removed.
>>> Fatal data abort:
>>> x0: ffffa02506e64400
>>> x1: ffff0001ea401880 (g_raid3_post_sync + 3a145f8)
>>> x2:               4b
>>> x3: a343932b0b22fb30
>>> x4:                0
>>> x5:  3310b0d062d0e1d
>>> x6: 1d0e2d060d0b3103
>>> x7:                0
>>> x8:         ea325df8
>>> x9: ffff0001eec946d0 ($d.6 + 0)
>>> x10: ffff0001ea401880 (g_raid3_post_sync + 3a145f8)
>>> x11:                0
>>> x12:                0
>>> x13: ffff000000cd8960 (lock_class_mtx_sleep + 0)
>>> x14:                0
>>> x15: ffffa02506e64405
>>> x16: ffff0001eec94860 (_DYNAMIC + 160)
>>> x17: ffff00000063a450 (ifc_attach_cloner + 0)
>>> x18: ffff0001eb290400 (g_raid3_post_sync + 48a3178)
>>> x19: ffff0001eec94600 (vnet_epair_init_vnet_init + 0)
>>> x20: ffff000000fa5b68 (vnet_sysinit_sxlock + 18)
>>> x21: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>>> x22: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>>> x23: ffffa0000042e500
>>> x24: ffffa0000042e500
>>> x25: ffff000000ce0788 (linker_lookup_set_desc + 0)
>>> x26: ffffa0203cdef780
>>> x27: ffff0001eec94698 (__set_sysinit_set_sym_if_epairmodule_sys_init =
+ 0)
>>> x28: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>>> x29: ffff0001eb290430 (g_raid3_post_sync + 48a31a8)
>>> sp: ffff0001eb290400
>>> lr: ffff0001eec82a4c ($x.1 + 3c)
>>> elr: ffff0001eec82a60 ($x.1 + 50)
>>> spsr:         60000045
>>> far: ffff0002d8fba4c8
>>> esr:         96000046
>>> panic: vm_fault failed: ffff0001eec82a60 error 1
>>> cpuid =3D 14
>>> time =3D 1687625470
>>> KDB: stack backtrace:
>>> db_trace_self() at db_trace_self
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>>> vpanic() at vpanic+0x13c
>>> panic() at panic+0x44
>>> data_abort() at data_abort+0x2fc
>>> handle_el1h_sync() at handle_el1h_sync+0x14
>>> --- exception, esr 0x96000046
>>> $x.1() at $x.1+0x50
>>> vnet_register_sysinit() at vnet_register_sysinit+0x114
>>> linker_load_module() at linker_load_module+0xae4
>>> kern_kldload() at kern_kldload+0xfc
>>> sys_kldload() at sys_kldload+0x60
>>> do_el0_sync() at do_el0_sync+0x608
>>> handle_el0_sync() at handle_el0_sync+0x44
>>> --- exception, esr 0x56000000
>>> KDB: enter: panic
>>> [ thread pid 70419 tid 101003 ]
>>> Stopped at      kdb_enter+0x44: str     xzr, [x19, #3200]
>>> db>=20
>>>=20
>>> I'll see if a re-run is repeatable.
>>>=20
>>=20
>> It repeats:
>>=20
>> GEOM_STRIPE: Device stripe/stripe.VkbPk1 deactivated.
>> GEOM_STRIPE: Disk md1 removed from stripe.VkbPk1.
>> GEOM_STRIPE: Disk md0 removed from stripe.VkbPk1.
>> GEOM_STRIPE: Device stripe.VkbPk1 destroyed.
>> GEOM_NOP: Device md0.nop created.
>> g_vfs_done():md0.nop[READ(offset=3D5885952, length=3D8192)]error =3D =
5
>> GEOM_NOP: Device md0.nop removed.
>> GEOM_NOP: Device md0.nop created.
>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D =
5
>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D =
5
>> GEOM_NOP: Device md0.nop removed.
>> GEOM_NOP: Device md0.nop created.
>> GEOM_NOP: Device md0.nop removed.
>> Fatal data abort:
>> x0: ffffa0003b1a9500
>> x1: ffff00021b530260
>> x2:               4b
>> x3: a343932b0b22fb30
>> x4:                0
>> x5:  3310b0d062d0e1d
>> x6: 1d0e2d060d0b3103
>> x7:                0
>> x8:         ea325df8
>> x9: ffff00021d6946d0 ($d.6 + 0)
>> x10: ffff00021b530260
>> x11:                0
>> x12:                0
>> x13: ffff000000cd8960 (lock_class_mtx_sleep + 0)
>> x14:                0
>> x15: ffffa0003b1a9505
>> x16: ffff00021d694860 (_DYNAMIC + 160)
>> x17: ffff00000063a450 (ifc_attach_cloner + 0)
>> x18: ffff00021a6ea400
>> x19: ffff00021d694600 (vnet_epair_init_vnet_init + 0)
>> x20: ffff000000fa5b68 (vnet_sysinit_sxlock + 18)
>> x21: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>> x22: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>> x23: ffffa00000431500
>> x24: ffffa00000431500
>> x25: ffff000000ce0788 (linker_lookup_set_desc + 0)
>> x26: ffffa02e1ab6d180
>> x27: ffff00021d694698 (__set_sysinit_set_sym_if_epairmodule_sys_init =
+ 0)
>> x28: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>> x29: ffff00021a6ea430
>> sp: ffff00021a6ea400
>> lr: ffff00021d682a4c ($x.1 + 3c)
>> elr: ffff00021d682a60 ($x.1 + 50)
>> spsr:         60000045
>> far: ffff0003079ba4c8
>> esr:         96000046
>> panic: vm_fault failed: ffff00021d682a60 error 1
>> cpuid =3D 1
>> time =3D 1687628622
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>> vpanic() at vpanic+0x13c
>> panic() at panic+0x44
>> data_abort() at data_abort+0x2fc
>> handle_el1h_sync() at handle_el1h_sync+0x14
>> --- exception, esr 0x96000046
>> $x.1() at $x.1+0x50
>> vnet_register_sysinit() at vnet_register_sysinit+0x114
>> linker_load_module() at linker_load_module+0xae4
>> kern_kldload() at kern_kldload+0xfc
>> sys_kldload() at sys_kldload+0x60
>> do_el0_sync() at do_el0_sync+0x608
>> handle_el0_sync() at handle_el0_sync+0x44
>> --- exception, esr 0x56000000
>> KDB: enter: panic
>> [ thread pid 36377 tid 100985 ]
>> Stopped at      kdb_enter+0x44: str     xzr, [x19, #3200]
>> db>=20
>>=20
>>=20
>> For reference, the output of the run in the ssh
>> session ends with:
>>=20
>> . . .
>> sys/kqueue/libkqueue/kqueue_test:main  ->  passed  [48.258s]
>> sys/mac/bsdextended/ugidfw_test:main  ->  skipped: mac_bsdextended =
not loaded  [0.006s]
>> sys/mac/portacl/nobody_test:main  ->  skipped: MAC_PORTACL is =
unavailable.  [0.010s]
>> sys/mac/portacl/root_test:main  ->  skipped: MAC_PORTACL is =
unavailable.  [0.010s]
>> sys/mqueue/mqueue_test:mqtest1  ->  passed  [0.025s]
>> sys/mqueue/mqueue_test:mqtest2  ->  passed  [0.025s]
>> sys/mqueue/mqueue_test:mqtest5  ->  passed  [0.025s]
>> sys/net/if_ovpn/if_ovpn_c:tcp  ->  skipped: if_ovpn not loaded  =
[0.006s]
>> sys/netinet/arp:arp_add_success  -> =20
>>=20
>> That should give some extra information about the context
>> of failure.
>=20
> So I installed, booted, and tried my debug build. It failed
> the same way in the same place, with no extra console
> reporting for the crash by the debug code: no assertion
> failures or WITNESS reports or the like first.

I tried doing just:

# kyua test -k /usr/tests/Kyuafile sys/netinet/arp

and it crashed the same way at the same place. The prior
kyua activity in other tests does not need to be involved
to get the crash.

For now I've touched the /usr/tests/sys/netinet/Kyuafile
to comment out the arp test line. There does not seem to
be a supported way to indicate to skip just one or a few
specific tests. So I'm touching a do-not-touch file
instead.

We will see how far it gets when skipping sys/netinet/arp .

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2CACE963-7846-475D-B139-D11B551E4A3F>