Date: Sun, 14 Dec 2025 19:28:08 +0000 From: bugzilla-noreply@freebsd.org To: net@FreeBSD.org Subject: [Bug 289017] [lagg] A time-of-check to time-of-use (TOCTOU) race exists in the Link Aggregation (LAGG) network subsystem Message-ID: <bug-289017-7501-7ZTsJJiAzZ@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-289017-7501@https.bugs.freebsd.org/bugzilla/> References: <bug-289017-7501@https.bugs.freebsd.org/bugzilla/>
index | next in thread | previous in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=289017 --- Comment #7 from Gui-Dong Han <hanguidong02@gmail.com> --- (In reply to Zhenlei Huang from comment #1) I can reliably reproduce the panic on an unmodified GENERIC kernel within seconds using the scripts provided. However, by inserting artificial delays to widen the race window, I captured the specific stack trace below. Crash log: Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x40 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff82825bdd stack pointer = 0x28:0xfffffe0068fc58c0 frame pointer = 0x28:0xfffffe0068fc58d0 code segment = base 0x0, limit 0xfffff, type 0x1b [TOCTOU_DEBUG] SIOCSLAGG: Change! = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 860 (poc) rdi: fffff8000363c000 rsi: 00000000d1f2023f rdx: 00000000000000ff rcx: fffffe0068fc58ec r8: 0000000000000800 r9: 0000000000000008 rax: 0000000000000000 rbx: fffff80004a08300 rbp: fffffe0068fc58d0 r10: fffffe0068fc5800 r11: 00fff58b8d9e8b8c r12: 000000000000000e [TOCTOU_DEBUG] SIOCSLAGG: Change! r13: 0000000000000008 r14: fffff8000363c000 r15: fffff80003624800 trap number = 12 panic: page fault cpuid = 2 time = 1765735007 KDB: stack backtrace: #0 0xffffffff80ba8f1d at kdb_backtrace+0x5d #1 0xffffffff80b5aa11 at vpanic+0x161 #2 0xffffffff80b5a8a3 at panic+0x43 #3 0xffffffff8104dbfa at trap_pfault+0x3da #4 0xffffffff81023e88 at calltrap+0x8 #5 0xffffffff82821f7a at lagg_lacp_start+0x1a #6 0xffffffff8281fa25 at lagg_transmit_ethernet+0xb5 #7 0xffffffff80c85c5c at ether_output_frame+0xcc #8 0xffffffff80c85a50 at ether_output+0x6b0 #9 0xffffffff80d21a48 at ip_output+0x13a8 #10 0xffffffff80d52cf0 at udp_send+0xb60 #11 0xffffffff80c0145c at sosend_dgram+0x31c #12 0xffffffff80c0242f at sousrsend+0x5f #13 0xffffffff80c0aec0 at kern_sendit+0x1c0 #14 0xffffffff80c0b1f2 at sendit+0x1b2 #15 0xffffffff80c0b02d at sys_sendto+0x4d #16 0xffffffff8104e547 at amd64_syscall+0x117 #17 0xffffffff8102479b at fast_syscall_common+0xf8 This crash indicates that lagg_lacp_start was executing after the protocol resources had already been cleared by the detach routine. This confirms a severe lack of synchronization between the data path and the control path, which can lead to various race conditions. I strongly recommend validating any proposed fix by running the attached stress-test scripts for an extended period. -- You are receiving this mail because: You are the assignee for the bug.help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-289017-7501-7ZTsJJiAzZ>
