Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Jan 2024 21:23:54 +0100
From:      Rainer Hurling <rhurlin@gwdg.de>
To:        <freebsd-current@freebsd.org>
Cc:        Gleb Smirnoff <glebius@freebsd.org>
Subject:   kernel: fatal trap 12 on CURRENT, when using WireGuard
Message-ID:  <423b62fc-6687-4e56-b8e7-ecaebcadfd7f@gwdg.de>

next in thread | raw e-mail | index | archive | help
I tried to update my 15.0-CURRENT box from n267335-499e84e16f56 to a 
very recent commit. The build and install went fine. After booting with 
new base, I got a page fault with the following error:


Kernel page fault with the following non-sleepable locks held:
shared rm netlink lock (netlink lock) r = 0 (0xfffff8005fc8ca20) locked 
@ /usr/src/sys/netlink/netlink_domain.c:241
exclusive rw lle (lle) r = 0 (0xfffff801951dce90) locked @ 
/usr/src/sys/netinet/in.c:1716
stack backtrace:
#0 0xffffffff80bc6c45 at witness_debugger+0x65
#1 0xffffffff80bc7d89 at witness_warn+0x3e9
#2 0xffffffff81056b18 at trap_pfault+0x88
#3 0xffffffff81028708 at calltrap+0x8
#4 0xffffffff80dbd6a2 at nl_send_group+0x1d2
#5 0xffffffff80dc0e27 at _nlmsg_flush+0x37
#6 0xffffffff80dc4fdc at rtnl_lle_event+0x10c
#7 0xffffffff80d15e32 at arp_mark_lle_reachable+0xd2
#8 0xffffffff80d15b43 at arp_check_update_lle+0x293
#9 0xffffffff80d151c5 at arpintr+0xa65
#10 0xffffffff80caaaed at netisr_dispatch_src+0xad
#11 0xffffffff80c8d57a at ether_demux+0x0x17a
#12 0xffffffff80c8ec53 at ether_nh_input+0x403
#13 0xffffffff80caaaed at netisr_dispatch_src+0xad
#14 0xffffffff80c8d9c9 at ether_input+0xd9
#15 0xffffffff80ca66ac at iflib_rxeof+0xe4c
#16 0xffffffff80ca0b5a at _task_fn_rx+0x7a
#17 0xffffffff80ba0118 at gtaskqueue_run_locked+0xa8

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x30000
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80dc0a10
stack pointer           = 0x28:0xfffffe006a3a8760
frame pointer           = 0x28:0xfffffe006a3a8790
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1. def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_0)
rdi: fffffe006a3a8850 rsi: fffffe006a3a86f0 rdx: fffffe006a3a87b0
rcx: fffff80001f88740  r8: ffffffff83210090  r9: 0000000000000000
rax: 0000000000000000 rbx: 0000000000030000 rbp: fffffe006a3a8790
r10: 0000000000000001 r11: 0000000000000000 r12: fffff8005fc8ca00
r13: fffff8005fc8ca20 r14: fffffe006a3a8850 r15: 0000000000000000
trap number             = 12
panic: page fault
cpuid = 0
time = 1704824328
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfffffe006a3a8430
vpanic() at vpanic+0x131/frame 0xfffffe006a3a8560
panic() at panic+0x43/frame 0xfffffe006a3a85c0
trap_fatal() at trap_fatal+0x40f/frame 0xfffffe006a3a8620
trap_pfault() at trap_pfault+0xae/frame 0xfffffe006a3a8690
calltrap() at calltrap+0x8/frame 0xfffffe006a3a8690
--- trap 0xc, rip = 0xffffffff80dc0a10, rsp = 0xfffffe006a3a8760, rbp = 
0xfffffe006a3a8790 ---
nl_send_one() at nl_send_one+0x20/frame 0xfffffe006a3a8790
nl_send_group() at nl_send_group+0x1d2/frame 0xfffffe006a3a8820
_nlmsg-flush() at _nlmsg_flush+0x37/frame 0xfffffe006a3a8840
rtnl_lle_event() at rtnl_lle_event+0x10c/frame 0xfffffe006a3a88e0
arp_mark_lle_reachable() at arp_mark_lle_reachable+0xd2/frame 
0xfffffe006a3a8930
arp_check_update_lle() at arp_check_update_lle+0x293/frame 
0xfffffe006a3a8a00
arpintr() at arpintr+0xa65/frame 0xfffffe006a3a8b60
netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe006a3a8bc8
ether_demux() at ether_demux+0x17a/frame 0xfffffe006a4a8bf0
ether_nh_input() at ether_nh_input+0x403/frame 0xfffffe006a3a8c40
netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe006a3a8ca0
ether_input() at ehter_input+0xd9/frame 0xfffffe006a3a8d00
iflib_rxeof() at iflib_rxeof+0xe4c/frame 0xfffffe006a3a8e00
_task_fn_rx() at _task_fn_rx+0x7a/frame 0xfffffe006a3a8e40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa8/frame 
0xfffffe006a3a8ec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xd3/frame 
0xfffffe006a3a8ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe006a3a8f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe006a3a8f30
--- trap 0xf2b9f109, rip = 0x7afef8a176bef8a5, rsp = 0xddc963edd18963e9, 
rbp = 0x61f64fc36db64fc7
KDB: enter: panic
[ thread pid 0 tid 100067 ]
Stopped at      kdb_enter+0x33: movq    $0,0xe3a582(%rip)
db>


Since the current process 'if_io_tqg_0' and problems with netlink are 
mentioned, I searched in the area of my network connections. I 
discovered that this page fault only occurs when a connection is 
established with WireGuard (wg-quick up wg0). Without using WireGuard, 
this error does not occur.

I was able to find out at which commit this behavior occurs with my box:
- Up to commit main-n267347-660bd40a598a everything is fine.
- The two following commits n267348-67d9023f07a4 and 
n267349-0ad011ececb9 do not build on my box (module/netlink broken ...).
- From commit n267349-0ad011ececb9 (netlink) onwards this page fault 
occurs when WireGuard is started.

Any help is greatly appreciated.
CC'ed Gleb Smirnoff due to the affected commits.

Regards,
Rainer Hurling



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?423b62fc-6687-4e56-b8e7-ecaebcadfd7f>