Date: Tue, 9 Jan 2024 21:23:54 +0100 From: Rainer Hurling <rhurlin@gwdg.de> To: <freebsd-current@freebsd.org> Cc: Gleb Smirnoff <glebius@freebsd.org> Subject: kernel: fatal trap 12 on CURRENT, when using WireGuard Message-ID: <423b62fc-6687-4e56-b8e7-ecaebcadfd7f@gwdg.de>
next in thread | raw e-mail | index | archive | help
I tried to update my 15.0-CURRENT box from n267335-499e84e16f56 to a very recent commit. The build and install went fine. After booting with new base, I got a page fault with the following error: Kernel page fault with the following non-sleepable locks held: shared rm netlink lock (netlink lock) r = 0 (0xfffff8005fc8ca20) locked @ /usr/src/sys/netlink/netlink_domain.c:241 exclusive rw lle (lle) r = 0 (0xfffff801951dce90) locked @ /usr/src/sys/netinet/in.c:1716 stack backtrace: #0 0xffffffff80bc6c45 at witness_debugger+0x65 #1 0xffffffff80bc7d89 at witness_warn+0x3e9 #2 0xffffffff81056b18 at trap_pfault+0x88 #3 0xffffffff81028708 at calltrap+0x8 #4 0xffffffff80dbd6a2 at nl_send_group+0x1d2 #5 0xffffffff80dc0e27 at _nlmsg_flush+0x37 #6 0xffffffff80dc4fdc at rtnl_lle_event+0x10c #7 0xffffffff80d15e32 at arp_mark_lle_reachable+0xd2 #8 0xffffffff80d15b43 at arp_check_update_lle+0x293 #9 0xffffffff80d151c5 at arpintr+0xa65 #10 0xffffffff80caaaed at netisr_dispatch_src+0xad #11 0xffffffff80c8d57a at ether_demux+0x0x17a #12 0xffffffff80c8ec53 at ether_nh_input+0x403 #13 0xffffffff80caaaed at netisr_dispatch_src+0xad #14 0xffffffff80c8d9c9 at ether_input+0xd9 #15 0xffffffff80ca66ac at iflib_rxeof+0xe4c #16 0xffffffff80ca0b5a at _task_fn_rx+0x7a #17 0xffffffff80ba0118 at gtaskqueue_run_locked+0xa8 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x30000 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80dc0a10 stack pointer = 0x28:0xfffffe006a3a8760 frame pointer = 0x28:0xfffffe006a3a8790 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1. def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_0) rdi: fffffe006a3a8850 rsi: fffffe006a3a86f0 rdx: fffffe006a3a87b0 rcx: fffff80001f88740 r8: ffffffff83210090 r9: 0000000000000000 rax: 0000000000000000 rbx: 0000000000030000 rbp: fffffe006a3a8790 r10: 0000000000000001 r11: 0000000000000000 r12: fffff8005fc8ca00 r13: fffff8005fc8ca20 r14: fffffe006a3a8850 r15: 0000000000000000 trap number = 12 panic: page fault cpuid = 0 time = 1704824328 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe006a3a8430 vpanic() at vpanic+0x131/frame 0xfffffe006a3a8560 panic() at panic+0x43/frame 0xfffffe006a3a85c0 trap_fatal() at trap_fatal+0x40f/frame 0xfffffe006a3a8620 trap_pfault() at trap_pfault+0xae/frame 0xfffffe006a3a8690 calltrap() at calltrap+0x8/frame 0xfffffe006a3a8690 --- trap 0xc, rip = 0xffffffff80dc0a10, rsp = 0xfffffe006a3a8760, rbp = 0xfffffe006a3a8790 --- nl_send_one() at nl_send_one+0x20/frame 0xfffffe006a3a8790 nl_send_group() at nl_send_group+0x1d2/frame 0xfffffe006a3a8820 _nlmsg-flush() at _nlmsg_flush+0x37/frame 0xfffffe006a3a8840 rtnl_lle_event() at rtnl_lle_event+0x10c/frame 0xfffffe006a3a88e0 arp_mark_lle_reachable() at arp_mark_lle_reachable+0xd2/frame 0xfffffe006a3a8930 arp_check_update_lle() at arp_check_update_lle+0x293/frame 0xfffffe006a3a8a00 arpintr() at arpintr+0xa65/frame 0xfffffe006a3a8b60 netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe006a3a8bc8 ether_demux() at ether_demux+0x17a/frame 0xfffffe006a4a8bf0 ether_nh_input() at ether_nh_input+0x403/frame 0xfffffe006a3a8c40 netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe006a3a8ca0 ether_input() at ehter_input+0xd9/frame 0xfffffe006a3a8d00 iflib_rxeof() at iflib_rxeof+0xe4c/frame 0xfffffe006a3a8e00 _task_fn_rx() at _task_fn_rx+0x7a/frame 0xfffffe006a3a8e40 gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa8/frame 0xfffffe006a3a8ec0 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xd3/frame 0xfffffe006a3a8ef0 fork_exit() at fork_exit+0x82/frame 0xfffffe006a3a8f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe006a3a8f30 --- trap 0xf2b9f109, rip = 0x7afef8a176bef8a5, rsp = 0xddc963edd18963e9, rbp = 0x61f64fc36db64fc7 KDB: enter: panic [ thread pid 0 tid 100067 ] Stopped at kdb_enter+0x33: movq $0,0xe3a582(%rip) db> Since the current process 'if_io_tqg_0' and problems with netlink are mentioned, I searched in the area of my network connections. I discovered that this page fault only occurs when a connection is established with WireGuard (wg-quick up wg0). Without using WireGuard, this error does not occur. I was able to find out at which commit this behavior occurs with my box: - Up to commit main-n267347-660bd40a598a everything is fine. - The two following commits n267348-67d9023f07a4 and n267349-0ad011ececb9 do not build on my box (module/netlink broken ...). - From commit n267349-0ad011ececb9 (netlink) onwards this page fault occurs when WireGuard is started. Any help is greatly appreciated. CC'ed Gleb Smirnoff due to the affected commits. Regards, Rainer Hurling
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?423b62fc-6687-4e56-b8e7-ecaebcadfd7f>