From nobody Fri Aug 29 09:34:13 2025 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4cCtQ42Bzcz66QdJ for ; Fri, 29 Aug 2025 09:34:20 +0000 (UTC) (envelope-from zlei@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R13" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4cCtQ41J3xz4DxX; Fri, 29 Aug 2025 09:34:20 +0000 (UTC) (envelope-from zlei@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1756460060; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EX5KC0MM42sOqAbiijZvrvOGxyy9h+X7WBsvfcvoY8s=; b=uo28Wl8262EZIxQDv5d3H1DpkJN4TbItIX/ZaXmRvWo2Poh5jlXqYnY/bgKVrMaID7wvvW ktJ/04ENmRyAmL69C+xyZIxXhwIwBKIxN6FMzUJWbd3qCB5sND7sj8IqZjC4UKG5yakYnM mOH+rQpuQWP/99pVeqZRAm8j59k3TlHRBRYMMaFSOHwyRTUrV1Zl9h0bkhg/VqZa0NkTw9 e2q/vzwpjQxWvK+lkUnZ6aH602QZag7+y5Xdi4clyKeugIkmEKFPcuwPUnMjE5DuJ7lHrh 4ys09i9Xg9UIk4NNYDBE6K4Xn0RYZJb0ct+BQiGElhbbM9MGTUgI4Ffy8Pqj8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1756460060; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EX5KC0MM42sOqAbiijZvrvOGxyy9h+X7WBsvfcvoY8s=; b=G+JleoXPB2Zk3ycwtRmoMTgF+bZ/rsI9XifAd6Wkdn3n1RypFR0QTUFw+Ujw+9pdtim/E6 I7bvBiNdNR6b2lEEjIa42JwdMf3BQ/3RxuFhaEdm/lWTM1M8nm5nPWrklqKk6Qt75TneRN KChaH2wqMX26I2PN8axLagEybTLnlK2hb93ipV73tj26/wJ/4b4Nn6O8O8iGrCgYu5PJDn XWWDFVFB2VgCW0IoeYGPCEhomcKSD3XY9fWvs47L09ak7k1R3aSBDjbFrezmIFvPpZDtOh NdGBq3J5vm8u6KUt7Q8Y47EEb87Rv4zYk3x43YK2B+GKSOzB11kOEMsxNjfIQQ== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1756460060; a=rsa-sha256; cv=none; b=B1D2wIETBSzy2+lD0Gt6IrlYDhHCI5K3vPeTh3CINKfeYX7PPmYX0Df0A9dPP08lzLT4EZ x9Bl5Z1bBTNJsLjAhHVL9e7pf7dKxDJIVTceFYYuD+7Gq08UNkt6xNgRlRZwEXHmQHEcLI YK9YtMcBxGRDw03FiiF2NNaur2k8Wx+VCImHIlsQFGLOmdbQ4J/bRLKfrhObvwm1vlyU9v 49wKUu54vUWrNO5p36+tE6uD4F6mFoNSeLtHE2UXsiipgS3fRmK7YPEpcr16E7CPf2LCo4 rfBTkH5+/7JWg7AN84d1XxY7xO1/d5LAzjv/RjgmN5m0t0pXoxqVMx8HNcU2Pw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from smtpclient.apple (unknown [IPv6:2001:19f0:6001:9db:98f0:9fe0:3545:10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: zlei/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 4cCtQ31vqRz5d6; Fri, 29 Aug 2025 09:34:18 +0000 (UTC) (envelope-from zlei@FreeBSD.org) Content-Type: text/plain; charset=us-ascii List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.10\)) Subject: Re: Kernel deadlocks on 14.3-STABLE with 100GbE card From: Zhenlei Huang In-Reply-To: <1756457078.0404318000.u4ltbc3f@test1.fwdcdn.com> Date: Fri, 29 Aug 2025 17:34:13 +0800 Cc: FreeBSD Net Content-Transfer-Encoding: quoted-printable Message-Id: <23C7DA09-BCDD-43A8-8B7B-27B6F1318BB8@FreeBSD.org> References: <1753769100.0108837000.0kt30ud9@test1.fwdcdn.com> <1756457078.0404318000.u4ltbc3f@test1.fwdcdn.com> To: Paul X-Mailer: Apple Mail (2.3696.120.41.1.10) > On Aug 29, 2025, at 5:08 PM, Paul wrote: >=20 >=20 > Hi! >=20 >=20 > We have finally managed to reproduce this issue with the help of = iperf3. >=20 > We have triggered a kernel panic with `sysctl debug.kdb.panic=3D1` to = collect core dump, when iperf3 process has entered the inf loop. >=20 > Here is the basic analysis, please ask for more if required: >=20 > (kgdb) bt > #0 cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1530 > #1 0xffffffff808deec8 in ipi_nmi_handler () at = /usr/src/sys/x86/x86/mp_x86.c:1487 > #2 0xffffffff8090c7af in trap (frame=3D0xfffffe03edeb8f30) at = /usr/src/sys/amd64/amd64/trap.c:248 > #3 > #4 0xffffffff80640e30 in sbcut_internal = (sb=3Dsb@entry=3D0xfffff801b0ec6e00, len=3D-2145162648) at = /usr/src/sys/kern/uipc_sockbuf.c:1585 > #5 0xffffffff80640d78 in sbflush_internal (sb=3D) at = /usr/src/sys/kern/uipc_sockbuf.c:1547 > #6 sbflush_locked (sb=3D) at = /usr/src/sys/kern/uipc_sockbuf.c:1559 > #7 sbflush (sb=3Dsb@entry=3D0xfffff801b0ec6e00) at = /usr/src/sys/kern/uipc_sockbuf.c:1567 > #8 0xffffffff807488f3 in tcp_disconnect (tp=3D0xfffff8034a572a80) at = /usr/src/sys/netinet/tcp_usrreq.c:2702 > #9 0xffffffff80743897 in tcp_usr_disconnect (so=3D) at = /usr/src/sys/netinet/tcp_usrreq.c:704 > #10 0xffffffff80643655 in sodisconnect (so=3D0xfffff801b0ec6c00) at = /usr/src/sys/kern/uipc_socket.c:2085 > #11 soclose (so=3D0xfffff801b0ec6c00) at = /usr/src/sys/kern/uipc_socket.c:1920 > #12 0xffffffff8053e921 in fo_close (fp=3D0xfffff801b0ec6e00, = fp@entry=3D0xfffff801a51ab410, td=3D0x80236a68, = td@entry=3D0xfffff801a51ab410) at /usr/src/sys/sys/file.h:397 > #13 _fdrop (fp=3D0xfffff801b0ec6e00, fp@entry=3D0xfffff801a51ab410, = td=3D0x80236a68, td@entry=3D0xfffff80276bcd740) at = /usr/src/sys/kern/kern_descrip.c:3756 > #14 0xffffffff80541aca in closef (fp=3D0xfffff801a51ab410, = td=3D0xfffff80276bcd740) at /usr/src/sys/kern/kern_descrip.c:2851 > #15 0xffffffff80545e08 in closefp_impl (fdp=3D, = fd=3D, fp=3D, td=3D, = audit=3D) at /usr/src/sys/kern/kern_descrip.c:1324 > #16 0xffffffff8090de97 in syscallenter (td=3D0xfffff80276bcd740) at = /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:193 > #17 amd64_syscall (td=3D0xfffff80276bcd740, traced=3D0) at = /usr/src/sys/amd64/amd64/trap.c:1241 > #18 > #19 0x000000082510c87a in ?? () > Backtrace stopped: Cannot access memory at address 0x820dd0058 > (kgdb) fr 4 > #4 0xffffffff80640e30 in sbcut_internal = (sb=3Dsb@entry=3D0xfffff801b0ec6e00, len=3D-2145162648) at = /usr/src/sys/kern/uipc_sockbuf.c:1585 > 1585 next =3D (m =3D sb->sb_mb) ? m->m_nextpkt : 0; > (kgdb) p len > $33 =3D -2145162648 > (kgdb) set $total=3D(unsigned int)0 > (kgdb) set $count=3D(unsigned int)0 > (kgdb) set $next=3D(struct mbuf*)sb->sb_mb > (kgdb) while ($next !=3D 0) >> set $total=3D$total+$next.m_len >> set $count=3D$count+1 >> set $next=3D$next.m_next >> end > (kgdb) p $total > $34 =3D 2149804648 > (kgdb) p (int)$total > $35 =3D -2145162648 > (kgdb) p $count > $36 =3D 1484679 >=20 >=20 > As mentioned before, the problem occurs when the socket is being = closed. Now we know why. Because of a cast here: >=20 > m_freem(sbcut_internal(sb, (int)sb->sb_ccc)); >=20 > When `sb->sb_ccc` grows above the max unsigned value that can be = stored in `int` this cast leads to an infinite=20 > loop, within this function. As `len` smaller than 0 is basically = equivalent to 0 in `sbcut_internal()`. Just a note. There's KASSERT in sbcut_internal() to check parameter len, ``` static struct mbuf * sbcut_internal(struct sockbuf *sb, int len) { struct mbuf *m, *next, *mfree; bool is_tls; KASSERT(len >=3D 0, ("%s: len is %d but it is supposed to be >=3D = 0", __func__, len)); ... } ``` so you can retest with kernel `options INVARIANTS` on to verify that, = if the overflow occurs. >=20 > But that's just a part of a problem. Why does the buffer grow this = large? Our limit is: >=20 > kern.ipc.maxsockbuf=3D157286400 >=20 > Is it expected to grow so far beyond this limit? >=20 >=20 > The way we managed to reproduce the issue is to simply spam one host = with a traffic from another host: >=20 > Client: >=20 > iperf3 --parallel 8 --time 10 --bidir --client >=20 > Server (where bug occurs): >=20 > iperf3 --server >=20 >=20 > My guess is the limit is not applied on packet basis. But instead, at = some other trigger points. > And when there is a burst we manage to accumulate so many packets that = their total size becomes > 2147483647. > The fact that this is a 100GbE card makes it much more likely. >=20 >> Hi! >> It has been a 4th time now that our server had to be hard re-booted. = Last two of them in the span of two hours. >> It was only a week since the server was in production. >>=20 >>=20 >> ... >>=20 >=20 Best regards, Zhenlei=