Date: Mon, 20 Apr 2026 14:36:39 -0400 From: Mark Johnston <markj@freebsd.org> To: Olgun Adak <olgun.adak@trexquant.com> Cc: freebsd-stable@freebsd.org, freebsd-net@freebsd.org Subject: Re: [REGRESSION] nfsd TCP socket lockup on 14.3-RELEASE-p9/p10 - Confirmed on Multiple Systems Message-ID: <aeZyN5ns0sbiD1Ok@nuc> In-Reply-To: <CAFww=iya1PkGDXeP5PVx_NviwQNx%2Bh4Yuuv4BZXLtfR8ryX_2A@mail.gmail.com> References: <CAFww=iytpc%2BJJuiPJtTrKKkqZaP%2BZ9OzHnTCzLk=MSEkzGwNzA@mail.gmail.com> <aeY3j_IKV2eUpD3l@nuc> <CAFww=iya1PkGDXeP5PVx_NviwQNx%2Bh4Yuuv4BZXLtfR8ryX_2A@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
On Mon, Apr 20, 2026 at 10:34:22AM -0400, Olgun Adak wrote: > Hi Mark, > > I have a truncated output of procstat -kk without -a. > > PID TID COMM TDNAME KSTACK > [...] > Unfortunately this doesn't reveal much. All of the NFS threads are trying to send data, but the socket buffer isn't being drained for some reason. Full procstat output could help shed some light. > Best, > > -Olgun > > > On Mon, Apr 20, 2026 at 10:26 AM Mark Johnston <markj@freebsd.org> wrote: > > > On Mon, Apr 20, 2026 at 10:11:21AM -0400, Olgun Adak wrote: > > > Hello FreeBSD Community, > > > > > > We’ve run into a consistent nfsd lockup after moving from 14.3-RELEASE-p8 > > > to p10. We have verified this across two identical bare-metal systems. > > > Reverting to p8 via bectl immediately restores stability on both systems, > > > so this appears to be a regression introduced in the p9/p10 cycle. > > > > > > *The symptoms:* > > > > > > Under NFSv3 load, the nfsd service hangs and becomes unresponsive to all > > > clients. Looking at procstat -kk, we see a deadlock pattern where threads > > > are stuck waiting on soiolock: > > > > > > _sx_xlock_hard -> soiolock -> sosend_generic -> sosend -> svc_vc_reply > > > > Would you be able to share full "procstat -kka" output from an affected > > system? > > > > > > > > Several threads are blocked in _sx_xlock_hard while others sit in sbwait. > > > > > > *The environment:* > > > > > > The systems are bare-metal with 2 x dual-port Mellanox ConnectX-6 100GbE > > > (mlx5en) cards. We see the issue regardless of MTU (1500 and 9000). > > > > > > Offloads: > > > > > > - > > > > > > TSO: Enabled > > > - > > > > > > LRO: The issue persists regardless of LRO state (tested with LRO > > > disabled and with software-only LRO). Hardware LRO is disabled in all > > cases. > > > > > > Relevant tunables: > > > > > > kern.ipc.soacceptqueue=1000 > > > kern.ipc.somaxconn=2000 > > > kern.ipc.maxsockbuf=67108864 > > > net.inet.tcp.sendbuf_max=67108864 > > > net.inet.tcp.sendspace=16777216 > > > net.inet.tcp.sendbuf_inc=262144 > > > net.inet.tcp.recvbuf_max=67108864 > > > net.inet.tcp.recvspace=16777216 > > > vfs.nfsd.srvmaxio=1048576 > > > > > > We have kept the p10 Boot Environments intact and can boot back into them > > > to run any additional debug commands or test patches if someone can help > > > point us in the right direction. > > > > > > Best regards, > > > -Olgun Adak > > > > -- > > > > This message is intended only for the use of the individual or entity to > which it is addressed, and may contain private and confidential > information. If you are not the intended recipient of this message you are > hereby notified that any review, dissemination, distribution or copying of > this message is strictly prohibited. If you have received this e-mail in > error, please immediately notify the sender by replying to this e-mail and > delete the message and any attachment(s) from your system. This > communication is for information purposes only and should not be regarded > as an offer to sell or as a solicitation of an offer to buy any financial > product, an official confirmation of any transaction, or as an official > statement of Trexquant Investment LP. All information is subject to change > without notice.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?aeZyN5ns0sbiD1Ok>
