Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 14 Dec 2025 19:28:08 +0000
From:      bugzilla-noreply@freebsd.org
To:        net@FreeBSD.org
Subject:   [Bug 289017] [lagg] A time-of-check to time-of-use (TOCTOU) race exists in the Link Aggregation (LAGG) network subsystem
Message-ID:  <bug-289017-7501-7ZTsJJiAzZ@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-289017-7501@https.bugs.freebsd.org/bugzilla/>
References:  <bug-289017-7501@https.bugs.freebsd.org/bugzilla/>

index | next in thread | previous in thread | raw e-mail

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=289017

--- Comment #7 from Gui-Dong Han <hanguidong02@gmail.com> ---
(In reply to Zhenlei Huang from comment #1)

I can reliably reproduce the panic on an unmodified GENERIC kernel within
seconds using the scripts provided.

However, by inserting artificial delays to widen the race window, I captured
the specific stack trace below.

Crash log:
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x40
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff82825bdd
stack pointer           = 0x28:0xfffffe0068fc58c0
frame pointer           = 0x28:0xfffffe0068fc58d0
code segment            = base 0x0, limit 0xfffff, type 0x1b
[TOCTOU_DEBUG] SIOCSLAGG: Change!
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, IOPL = 0
current process         = 860 (poc)
rdi: fffff8000363c000 rsi: 00000000d1f2023f rdx: 00000000000000ff
rcx: fffffe0068fc58ec  r8: 0000000000000800  r9: 0000000000000008
rax: 0000000000000000 rbx: fffff80004a08300 rbp: fffffe0068fc58d0
r10: fffffe0068fc5800 r11: 00fff58b8d9e8b8c r12: 000000000000000e
[TOCTOU_DEBUG] SIOCSLAGG: Change!
r13: 0000000000000008 r14: fffff8000363c000 r15: fffff80003624800
trap number             = 12
panic: page fault
cpuid = 2
time = 1765735007
KDB: stack backtrace:
#0 0xffffffff80ba8f1d at kdb_backtrace+0x5d
#1 0xffffffff80b5aa11 at vpanic+0x161
#2 0xffffffff80b5a8a3 at panic+0x43
#3 0xffffffff8104dbfa at trap_pfault+0x3da
#4 0xffffffff81023e88 at calltrap+0x8
#5 0xffffffff82821f7a at lagg_lacp_start+0x1a
#6 0xffffffff8281fa25 at lagg_transmit_ethernet+0xb5
#7 0xffffffff80c85c5c at ether_output_frame+0xcc
#8 0xffffffff80c85a50 at ether_output+0x6b0
#9 0xffffffff80d21a48 at ip_output+0x13a8
#10 0xffffffff80d52cf0 at udp_send+0xb60
#11 0xffffffff80c0145c at sosend_dgram+0x31c
#12 0xffffffff80c0242f at sousrsend+0x5f
#13 0xffffffff80c0aec0 at kern_sendit+0x1c0
#14 0xffffffff80c0b1f2 at sendit+0x1b2
#15 0xffffffff80c0b02d at sys_sendto+0x4d
#16 0xffffffff8104e547 at amd64_syscall+0x117
#17 0xffffffff8102479b at fast_syscall_common+0xf8

This crash indicates that lagg_lacp_start was executing after the protocol
resources had already been cleared by the detach routine.

This confirms a severe lack of synchronization between the data path and the
control path, which can lead to various race conditions.

I strongly recommend validating any proposed fix by running the attached
stress-test scripts for an extended period.

-- 
You are receiving this mail because:
You are the assignee for the bug.

help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-289017-7501-7ZTsJJiAzZ>