Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 24 Jun 2023 10:49:58 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Current FreeBSD <freebsd-current@freebsd.org>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit
Message-ID:  <FAF014A1-88B5-4CAE-8A5C-2C2065528003@yahoo.com>
In-Reply-To: <3FD359F8-CFCC-400F-B6DE-B635B747DE7F@yahoo.com>
References:  <3FD359F8-CFCC-400F-B6DE-B635B747DE7F@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Jun 24, 2023, at 10:00, Mark Millard <marklmi@yahoo.com> wrote:

> The running system build is a non-debug build (but
> with symbols not stripped).
>=20
> The HoneyComb's console log shows:
>=20
> . . .
> GEOM_STRIPE: Device stripe.IMfBZr destroyed.
> GEOM_NOP: Device md0.nop created.
> g_vfs_done():md0.nop[READ(offset=3D5885952, length=3D8192)]error =3D 5
> GEOM_NOP: Device md0.nop removed.
> GEOM_NOP: Device md0.nop created.
> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D 5
> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D 5
> GEOM_NOP: Device md0.nop removed.
> GEOM_NOP: Device md0.nop created.
> GEOM_NOP: Device md0.nop removed.
> Fatal data abort:
>  x0: ffffa02506e64400
>  x1: ffff0001ea401880 (g_raid3_post_sync + 3a145f8)
>  x2:               4b
>  x3: a343932b0b22fb30
>  x4:                0
>  x5:  3310b0d062d0e1d
>  x6: 1d0e2d060d0b3103
>  x7:                0
>  x8:         ea325df8
>  x9: ffff0001eec946d0 ($d.6 + 0)
> x10: ffff0001ea401880 (g_raid3_post_sync + 3a145f8)
> x11:                0
> x12:                0
> x13: ffff000000cd8960 (lock_class_mtx_sleep + 0)
> x14:                0
> x15: ffffa02506e64405
> x16: ffff0001eec94860 (_DYNAMIC + 160)
> x17: ffff00000063a450 (ifc_attach_cloner + 0)
> x18: ffff0001eb290400 (g_raid3_post_sync + 48a3178)
> x19: ffff0001eec94600 (vnet_epair_init_vnet_init + 0)
> x20: ffff000000fa5b68 (vnet_sysinit_sxlock + 18)
> x21: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
> x22: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
> x23: ffffa0000042e500
> x24: ffffa0000042e500
> x25: ffff000000ce0788 (linker_lookup_set_desc + 0)
> x26: ffffa0203cdef780
> x27: ffff0001eec94698 (__set_sysinit_set_sym_if_epairmodule_sys_init + =
0)
> x28: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
> x29: ffff0001eb290430 (g_raid3_post_sync + 48a31a8)
>  sp: ffff0001eb290400
>  lr: ffff0001eec82a4c ($x.1 + 3c)
> elr: ffff0001eec82a60 ($x.1 + 50)
> spsr:         60000045
> far: ffff0002d8fba4c8
> esr:         96000046
> panic: vm_fault failed: ffff0001eec82a60 error 1
> cpuid =3D 14
> time =3D 1687625470
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x13c
> panic() at panic+0x44
> data_abort() at data_abort+0x2fc
> handle_el1h_sync() at handle_el1h_sync+0x14
> --- exception, esr 0x96000046
> $x.1() at $x.1+0x50
> vnet_register_sysinit() at vnet_register_sysinit+0x114
> linker_load_module() at linker_load_module+0xae4
> kern_kldload() at kern_kldload+0xfc
> sys_kldload() at sys_kldload+0x60
> do_el0_sync() at do_el0_sync+0x608
> handle_el0_sync() at handle_el0_sync+0x44
> --- exception, esr 0x56000000
> KDB: enter: panic
> [ thread pid 70419 tid 101003 ]
> Stopped at      kdb_enter+0x44: str     xzr, [x19, #3200]
> db>=20
>=20
> I'll see if a re-run is repeatable.
>=20

It repeats:

GEOM_STRIPE: Device stripe/stripe.VkbPk1 deactivated.
GEOM_STRIPE: Disk md1 removed from stripe.VkbPk1.
GEOM_STRIPE: Disk md0 removed from stripe.VkbPk1.
GEOM_STRIPE: Device stripe.VkbPk1 destroyed.
GEOM_NOP: Device md0.nop created.
g_vfs_done():md0.nop[READ(offset=3D5885952, length=3D8192)]error =3D 5
GEOM_NOP: Device md0.nop removed.
GEOM_NOP: Device md0.nop created.
g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D 5
g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D 5
GEOM_NOP: Device md0.nop removed.
GEOM_NOP: Device md0.nop created.
GEOM_NOP: Device md0.nop removed.
Fatal data abort:
  x0: ffffa0003b1a9500
  x1: ffff00021b530260
  x2:               4b
  x3: a343932b0b22fb30
  x4:                0
  x5:  3310b0d062d0e1d
  x6: 1d0e2d060d0b3103
  x7:                0
  x8:         ea325df8
  x9: ffff00021d6946d0 ($d.6 + 0)
 x10: ffff00021b530260
 x11:                0
 x12:                0
 x13: ffff000000cd8960 (lock_class_mtx_sleep + 0)
 x14:                0
 x15: ffffa0003b1a9505
 x16: ffff00021d694860 (_DYNAMIC + 160)
 x17: ffff00000063a450 (ifc_attach_cloner + 0)
 x18: ffff00021a6ea400
 x19: ffff00021d694600 (vnet_epair_init_vnet_init + 0)
 x20: ffff000000fa5b68 (vnet_sysinit_sxlock + 18)
 x21: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
 x22: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
 x23: ffffa00000431500
 x24: ffffa00000431500
 x25: ffff000000ce0788 (linker_lookup_set_desc + 0)
 x26: ffffa02e1ab6d180
 x27: ffff00021d694698 (__set_sysinit_set_sym_if_epairmodule_sys_init + =
0)
 x28: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
 x29: ffff00021a6ea430
  sp: ffff00021a6ea400
  lr: ffff00021d682a4c ($x.1 + 3c)
 elr: ffff00021d682a60 ($x.1 + 50)
spsr:         60000045
 far: ffff0003079ba4c8
 esr:         96000046
panic: vm_fault failed: ffff00021d682a60 error 1
cpuid =3D 1
time =3D 1687628622
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x13c
panic() at panic+0x44
data_abort() at data_abort+0x2fc
handle_el1h_sync() at handle_el1h_sync+0x14
--- exception, esr 0x96000046
$x.1() at $x.1+0x50
vnet_register_sysinit() at vnet_register_sysinit+0x114
linker_load_module() at linker_load_module+0xae4
kern_kldload() at kern_kldload+0xfc
sys_kldload() at sys_kldload+0x60
do_el0_sync() at do_el0_sync+0x608
handle_el0_sync() at handle_el0_sync+0x44
--- exception, esr 0x56000000
KDB: enter: panic
[ thread pid 36377 tid 100985 ]
Stopped at      kdb_enter+0x44: str     xzr, [x19, #3200]
db>=20


For reference, the output of the run in the ssh
session ends with:

. . .
sys/kqueue/libkqueue/kqueue_test:main  ->  passed  [48.258s]
sys/mac/bsdextended/ugidfw_test:main  ->  skipped: mac_bsdextended not =
loaded  [0.006s]
sys/mac/portacl/nobody_test:main  ->  skipped: MAC_PORTACL is =
unavailable.  [0.010s]
sys/mac/portacl/root_test:main  ->  skipped: MAC_PORTACL is unavailable. =
 [0.010s]
sys/mqueue/mqueue_test:mqtest1  ->  passed  [0.025s]
sys/mqueue/mqueue_test:mqtest2  ->  passed  [0.025s]
sys/mqueue/mqueue_test:mqtest5  ->  passed  [0.025s]
sys/net/if_ovpn/if_ovpn_c:tcp  ->  skipped: if_ovpn not loaded  [0.006s]
sys/netinet/arp:arp_add_success  -> =20

That should give some extra information about the context
of failure.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FAF014A1-88B5-4CAE-8A5C-2C2065528003>