From nobody Sat Aug 30 03:31:18 2025 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4cDLJt0FNWz65n1R for ; Sat, 30 Aug 2025 03:31:26 +0000 (UTC) (envelope-from zlei@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R13" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4cDLJs6hXGz3SKj; Sat, 30 Aug 2025 03:31:25 +0000 (UTC) (envelope-from zlei@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1756524685; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=naNO+g81/tA/0XPOfnLOCinXX2bqm7RQj8tBzzHP2fA=; b=aXF/KpgVbZxj48GyQiC1ue/YkAB+H+LOjuygy84TSshS8pVc4LpLNKmU0fcCZDH3qfPsIb sBoeTXKW3FsTINcJxaSOi1oTd2V8GUgcpXlqV+KBNzfDFmlK89hgxZav/S5vjTqS0KGBg4 uTJ7eIhLPfZHOjRGEgwGl5fSr6c9zE5axTgT7YtnnHnLA57Y/Luk/ZITm0Od5v0j/bvRBD wOvB3K7Z2rQB/Yi3OQUy6xFi5PkULIushL6AHr9O/cg+z8Baw8Hi5K3aefADhKvButJy2N An/nfN5qn9FHi3ripYYeguwOdNPJEC2KMSP6rSVbvSxuFejBgTZaQ54Ru+t8kQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1756524685; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=naNO+g81/tA/0XPOfnLOCinXX2bqm7RQj8tBzzHP2fA=; b=Qzp65sIcipIm2SOhQ8dAlvIvj8WethZGRgqml+R4nYYvxzjByEO4bUq/1dkiSS012ejg+K jfXTD5KQiCtBkHtQkKiF5cUbGlFpbwmq+diZgzkWu8QLpGFuUSROXi7ls5YOljYKnoJrzf 73zeONo+XldUMrJLByMI/6KswqshfuQN9/BlD72kgfilx6nEJ7M/JLUSuWQn8b0JZuwz1g cj/opZB3BNebMWHlE0eYMZn0jAhRsobQ9S/oZTfmCrsLmIXCkVQOjKpR8uUbOBppmNyNWH n4h40cNFJ+UFGKEa+qILWVhbIwc4ZDawleSfBoRptjcKuhFq1Iz92ix0psRFew== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1756524685; a=rsa-sha256; cv=none; b=V1CTrSsPEcMXba7G++E+KCkCRwNhsHwcfgkfBrBpDU2YNlWm6qkUrJbLPibHIIy8bIgUWo OTLwGrpHZe/1ZfKcX/6y2rKvR8XDXSC3yJ66DlEKuXCuz92iuffcCg7fQQ79dAzuPErO27 3ezjzeDKzJGyV14YJKtG/h0w8ABCmV9l8AFca1SF4B5+Z3Mel7OLvDd1FahRYDTm1PQ5Pd 4CVCGCmF75x3QYg6JZhv2IDVYPH5T/Kqdvji5cMbVhdE2uhNnL8h5deLKPC4XDj6ebSo7C 8j5iMidS85q8QopRC+oE0aZnlRId2PqCMkk0j/E7fOvtqM+geTKpIxTpNolkfw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from smtpclient.apple (ns1.oxydns.net [45.32.91.63]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: zlei/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 4cDLJr6tDszlTb; Sat, 30 Aug 2025 03:31:24 +0000 (UTC) (envelope-from zlei@FreeBSD.org) Content-Type: text/plain; charset=us-ascii List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.10\)) Subject: Re: Kernel deadlocks on 14.3-STABLE with 100GbE card From: Zhenlei Huang In-Reply-To: <1756492291.0046645000.f2r62k9y@test1.fwdcdn.com> Date: Sat, 30 Aug 2025 11:31:18 +0800 Cc: FreeBSD Net Content-Transfer-Encoding: quoted-printable Message-Id: References: <1753769100.0108837000.0kt30ud9@test1.fwdcdn.com> <1756457078.0404318000.u4ltbc3f@test1.fwdcdn.com> <23C7DA09-BCDD-43A8-8B7B-27B6F1318BB8@FreeBSD.org> <1756492291.0046645000.f2r62k9y@test1.fwdcdn.com> To: Paul X-Mailer: Apple Mail (2.3696.120.41.1.10) > On Aug 30, 2025, at 2:33 AM, Paul wrote: >=20 > Hi Zhenlei, >=20 > Thanks for a suggestion.=20 >=20 > But is there a reason not to trust a core dump?=20 > Especially when the sum of all `mbuf`s matches the value show in frame = stack exactly. Yes you can trust the core dump. For RELEASES that is almost the only method. Users normally do not run = debug kernel in production. Also developers can fetch the same kernel and debug symbols and use = addr2line to diagnose. For stable branches and current, that may vary. If users run custom = config or compile by themself, then it is not easy for developers to get the same kernel / debug = symbols to diagnose. Then `options INVARIANTS` is much straight forward to help, assuming = user can compile. Anyway, either should be fine. Best regards, Zhenlei >=20 >>=20 >>=20 >>> On Aug 29, 2025, at 5:08 PM, Paul wrote: >>>=20 >>>=20 >>> Hi! >>>=20 >>>=20 >>> We have finally managed to reproduce this issue with the help of = iperf3. >>>=20 >>> We have triggered a kernel panic with `sysctl debug.kdb.panic=3D1` = to collect core dump, when iperf3 process has entered the inf loop. >>>=20 >>> Here is the basic analysis, please ask for more if required: >>>=20 >>> (kgdb) bt >>> #0 cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1530 >>> #1 0xffffffff808deec8 in ipi_nmi_handler () at = /usr/src/sys/x86/x86/mp_x86.c:1487 >>> #2 0xffffffff8090c7af in trap (frame=3D0xfffffe03edeb8f30) at = /usr/src/sys/amd64/amd64/trap.c:248 >>> #3 >>> #4 0xffffffff80640e30 in sbcut_internal = (sb=3Dsb@entry=3D0xfffff801b0ec6e00, len=3D-2145162648) at = /usr/src/sys/kern/uipc_sockbuf.c:1585 >>> #5 0xffffffff80640d78 in sbflush_internal (sb=3D) at = /usr/src/sys/kern/uipc_sockbuf.c:1547 >>> #6 sbflush_locked (sb=3D) at = /usr/src/sys/kern/uipc_sockbuf.c:1559 >>> #7 sbflush (sb=3Dsb@entry=3D0xfffff801b0ec6e00) at = /usr/src/sys/kern/uipc_sockbuf.c:1567 >>> #8 0xffffffff807488f3 in tcp_disconnect (tp=3D0xfffff8034a572a80) = at /usr/src/sys/netinet/tcp_usrreq.c:2702 >>> #9 0xffffffff80743897 in tcp_usr_disconnect (so=3D) = at /usr/src/sys/netinet/tcp_usrreq.c:704 >>> #10 0xffffffff80643655 in sodisconnect (so=3D0xfffff801b0ec6c00) at = /usr/src/sys/kern/uipc_socket.c:2085 >>> #11 soclose (so=3D0xfffff801b0ec6c00) at = /usr/src/sys/kern/uipc_socket.c:1920 >>> #12 0xffffffff8053e921 in fo_close (fp=3D0xfffff801b0ec6e00, = fp@entry=3D0xfffff801a51ab410, td=3D0x80236a68, = td@entry=3D0xfffff801a51ab410) at /usr/src/sys/sys/file.h:397 >>> #13 _fdrop (fp=3D0xfffff801b0ec6e00, fp@entry=3D0xfffff801a51ab410, = td=3D0x80236a68, td@entry=3D0xfffff80276bcd740) at = /usr/src/sys/kern/kern_descrip.c:3756 >>> #14 0xffffffff80541aca in closef (fp=3D0xfffff801a51ab410, = td=3D0xfffff80276bcd740) at /usr/src/sys/kern/kern_descrip.c:2851 >>> #15 0xffffffff80545e08 in closefp_impl (fdp=3D, = fd=3D, fp=3D, td=3D, = audit=3D) at /usr/src/sys/kern/kern_descrip.c:1324 >>> #16 0xffffffff8090de97 in syscallenter (td=3D0xfffff80276bcd740) at = /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:193 >>> #17 amd64_syscall (td=3D0xfffff80276bcd740, traced=3D0) at = /usr/src/sys/amd64/amd64/trap.c:1241 >>> #18 >>> #19 0x000000082510c87a in ?? () >>> Backtrace stopped: Cannot access memory at address 0x820dd0058 >>> (kgdb) fr 4 >>> #4 0xffffffff80640e30 in sbcut_internal = (sb=3Dsb@entry=3D0xfffff801b0ec6e00, len=3D-2145162648) at = /usr/src/sys/kern/uipc_sockbuf.c:1585 >>> 1585 next =3D (m =3D sb->sb_mb) ? m->m_nextpkt : 0; >>> (kgdb) p len >>> $33 =3D -2145162648 >>> (kgdb) set $total=3D(unsigned int)0 >>> (kgdb) set $count=3D(unsigned int)0 >>> (kgdb) set $next=3D(struct mbuf*)sb->sb_mb >>> (kgdb) while ($next !=3D 0) >>>> set $total=3D$total+$next.m_len >>>> set $count=3D$count+1 >>>> set $next=3D$next.m_next >>>> end >>> (kgdb) p $total >>> $34 =3D 2149804648 >>> (kgdb) p (int)$total >>> $35 =3D -2145162648 >>> (kgdb) p $count >>> $36 =3D 1484679 >>>=20 >>>=20 >>> As mentioned before, the problem occurs when the socket is being = closed. Now we know why. Because of a cast here: >>>=20 >>> m_freem(sbcut_internal(sb, (int)sb->sb_ccc)); >>>=20 >>> When `sb->sb_ccc` grows above the max unsigned value that can be = stored in `int` this cast leads to an infinite=20 >>> loop, within this function. As `len` smaller than 0 is basically = equivalent to 0 in `sbcut_internal()`. >>=20 >> Just a note. There's KASSERT in sbcut_internal() to check parameter = len, >>=20 >> ``` >> static struct mbuf * >> sbcut_internal(struct sockbuf *sb, int len) >> { >> struct mbuf *m, *next, *mfree; >> bool is_tls; >>=20 >> KASSERT(len >=3D 0, ("%s: len is %d but it is supposed to be = >=3D 0", >> __func__, len)); >> ... >> } >> ``` >>=20 >> so you can retest with kernel `options INVARIANTS` on to verify = that, if the overflow occurs. >>=20 >>>=20 >>> But that's just a part of a problem. Why does the buffer grow this = large? Our limit is: >>>=20 >>> kern.ipc.maxsockbuf=3D157286400 >>>=20 >>> Is it expected to grow so far beyond this limit? >>>=20 >>>=20 >>> The way we managed to reproduce the issue is to simply spam one host = with a traffic from another host: >>>=20 >>> Client: >>>=20 >>> iperf3 --parallel 8 --time 10 --bidir --client >>>=20 >>> Server (where bug occurs): >>>=20 >>> iperf3 --server >>>=20 >>>=20 >>> My guess is the limit is not applied on packet basis. But instead, = at some other trigger points. >>> And when there is a burst we manage to accumulate so many packets = that their total size becomes > 2147483647. >>> The fact that this is a 100GbE card makes it much more likely. >>>=20 >>>> Hi! >>>> It has been a 4th time now that our server had to be hard = re-booted. Last two of them in the span of two hours. >>>> It was only a week since the server was in production. >>>>=20 >>>>=20 >>>> ... >>>>=20 >>>=20 >>=20 >>=20 >> Best regards, >> Zhenlei >>=20