Date: Mon, 23 Feb 2026 12:06:00 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 293382] Dead lock and kernel crash around closefp_impl Message-ID: <bug-293382-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=293382 Bug ID: 293382 Summary: Dead lock and kernel crash around closefp_impl Product: Base System Version: 14.3-STABLE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: devgs@ukr.net Hi! We've been using 14.4-STABLE for some time now and today a weird issue has popped up. All of the sudden, our multi-threaded network app has deadlocked on some threads, but not on others. We weren't able to neither attach to it with GDB nor kill it with -9. Hard lock inside the kernel. We've managed to collect a few samples of kernel backtrace for this process with `procstat -kk`. All, basically, identical: PID TID COMM TDNAME KSTACK 91545 101569 <redacted> - mi_switch+0xbd _sx_xlock_hard+0x4ef kern_close+0x179 amd64_syscall+0x117 fast_syscall_common+0xf8 91545 102281 <redacted> - mi_switch+0xbd _sx_xlock_hard+0x4ef kern_close+0x179 amd64_syscall+0x117 fast_syscall_common+0xf8 91545 102282 <redacted> <redacted-1> mi_switch+0xbd sleepq_catch_signals+0x2a2 sleepq_timedwait_sig+0x12 _sleep+0x1c1 umtxq_sleep+0x2cd do_wait+0x244 __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x117 fast_syscall_common+0xf8 91545 102283 <redacted> <redacted-2> mi_switch+0xbd sleepq_catch_signals+0x2a2 sleepq_timedwait_sig+0x12 _sleep+0x1c1 kqueue_scan+0xa11 kqueue_kevent+0x13b kern_kevent_fp+0x4b kern_kevent_generic+0xdf sys_kevent+0x61 amd64_syscall+0x117 fast_syscall_common+0xf8 91545 102284 <redacted> <redacted-3> mi_switch+0xbd _sleep+0x1f3 knote_fdclose+0xac closefp_impl+0xd0 amd64_syscall+0x117 fast_syscall_common+0xf8 91545 102285 <redacted> <redacted-4> mi_switch+0xbd sleepq_catch_signals+0x2a2 sleepq_timedwait_sig+0x12 _sleep+0x1c1 kqueue_scan+0xa11 kqueue_kevent+0x13b kern_kevent_fp+0x4b kern_kevent_generic+0xdf sys_kevent+0x61 amd64_syscall+0x117 fast_syscall_common+0xf8 91545 102286 <redacted> <redacted-5> mi_switch+0xbd sleepq_catch_signals+0x2a2 sleepq_timedwait_sig+0x12 _sleep+0x1c1 kqueue_scan+0xa11 kqueue_kevent+0x13b kern_kevent_fp+0x4b kern_kevent_generic+0xdf sys_kevent+0x61 amd64_syscall+0x117 fast_syscall_common+0xf8 Apparently, three threads were deadlocked: first two, that are unnamed and `<redacted-3>`. The last one is the thread that is handling inbound socket connections. Hundreds of thousands of them, mostly WebSocket. Two other threads also use sockets, but for outbound connections. During normal operation, sockets are being open and closed as needed, obviously. Seems like in some case this may lead to a deadlock. Where one thread enters some state in kernel where it hangs, holding the lock and preventing others from closing (or modifying descriptors generally). App is async and uses kqueue for networking sockets extensively. We suspect `<redacted-3>` to be the culprit, specifically its backtrace where `closefp_impl` is involved. And here's why. When this happened and the traffic was switched to a redundancy server, it almost immediately panicked and wend into reboot. Hopefully, we've got the core dump and were able to analyze it somewhat. And there we saw `closefp_impl` from within the same (not physically, different server) thread `<redacted-3>`: Fatal trap 12: page fault while in kernel mode cpuid = 22; apic id = 52 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80572e28 stack pointer = 0x28:0xfffffe071c126d70 frame pointer = 0x28:0xfffffe071c126dc0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 58518 (<redacted-3>) rdi: fffff83402622be0 rsi: 0000000000000000 rdx: 0000000000000000 rcx: 0000000000000000 r8: fffff80160b9c520 r9: fffffe071c127000 rax: 0000000000000000 rbx: 0000000000031361 rbp: fffffe071c126dc0 r10: 0000000000000001 r11: 0000000000002af8 r12: fffff80160b9c000 r13: fffff87af7163e18 r14: fffff83402622be0 r15: fffff87af7163e00 trap number = 12 panic: page fault cpuid = 22 time = 1771839128 KDB: stack backtrace: #0 0xffffffff8061303d at kdb_backtrace+0x5d #1 0xffffffff805c8091 at vpanic+0x161 #2 0xffffffff805c7f23 at panic+0x43 #3 0xffffffff80972f00 at trap_pfault+0x3e0 #4 0xffffffff8094af68 at calltrap+0x8 #5 0xffffffff8056b750 at closefp_impl+0xd0 #6 0xffffffff80973847 at amd64_syscall+0x117 #7 0xffffffff8094b85b at fast_syscall_common+0xf8 When inspecting it's kernel stack: (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:405 #2 0xffffffff805c7beb in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:523 #3 0xffffffff805c80e9 in vpanic (fmt=0xffffffff809d2ae7 "%s", ap=ap@entry=0xfffffe071c126c30) at /usr/src/sys/kern/kern_shutdown.c:967 #4 0xffffffff805c7f23 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:891 #5 0xffffffff80972f00 in trap_fatal (frame=<optimized out>, eva=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:1000 #6 0xffffffff80972f00 in trap_pfault (frame=0xfffffe071c126cb0, usermode=false, signo=<optimized out>, ucode=<optimized out>) #7 <signal handler called> #8 0xffffffff80572e28 in knote_drop (kn=0xfffff83402622be0, td=0xfffff80160b9c000) at /usr/src/sys/kern/kern_event.c:2730 #9 knote_fdclose (td=0xfffff80160b9c000, fd=201569) at /usr/src/sys/kern/kern_event.c:2695 #10 0xffffffff8056b750 in closefp_impl (fdp=0xfffffe0d1582a920, fd=0, fp=0xfffff81090d2c5a0, td=0xfffff80160b9c000, audit=true) at /usr/src/sys/kern/kern_descrip.c:1320 #11 0xffffffff80973847 in syscallenter (td=0xfffff80160b9c000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:193 #12 amd64_syscall (td=0xfffff80160b9c000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1241 #13 <signal handler called> #14 0x000000082deed32a in ?? () Backtrace stopped: Cannot access memory at address 0x85d08dbc8 Within `knote_drop` we observe a null pointer access: (kgdb) fr 8 #8 0xffffffff80572e28 in knote_drop (kn=0xfffff83402622be0, td=0xfffff80160b9c000) at /usr/src/sys/kern/kern_event.c:2730 2730 kn->kn_fop->f_detach(kn); (kgdb) l 2725 static void 2726 knote_drop(struct knote *kn, struct thread *td) 2727 { 2728 2729 if ((kn->kn_status & KN_DETACHED) == 0) 2730 kn->kn_fop->f_detach(kn); 2731 knote_drop_detached(kn, td); 2732 } 2733 2734 static void (kgdb) p kn->kn_fop $2 = (const struct filterops *) 0x0 If you need more info, please ask. We will be glad to provide it. --------------- System info: FreeBSD frv21.ukr.net 14.4-STABLE FreeBSD 14.4-STABLE stable/14-n273658-2f91ff89c56e FRV21 amd64 1404500 1404500 -- You are receiving this mail because: You are the assignee for the bug.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-293382-227>
