Date: Sun, 14 Jan 2024 03:30:30 +0000 From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 276299] Write performance to NFS share is ~4x slower than on 13.2 Message-ID: <bug-276299-3630-FHPjQURuqT@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-276299-3630@https.bugs.freebsd.org/bugzilla/> References: <bug-276299-3630@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D276299 --- Comment #10 from Rick Macklem <rmacklem@FreeBSD.org> --- By network fabric I mean everything from the TCP stack down, at both ends. A problem can easily manifest itself as only a problem during writing. Writing to an NFS server is very different traffic as reading from a NFS server. I am not saying that it is a network fabric problem, just that good read performance does not imply it is not a network fabric problem. I once saw a case where everything worked fine over NFS (where I worked as a sysadmin) until one specific NFS RPC was done. That NFS RPC (and only that NFS RPC would fail). It turned out to be a hardware bug in a network switch. Move the machine to a port on another switch and the problem went away. Move it onto the problem switch and the issue showed up again. There were no detectable other problems with this switch and the manufacturer returned it after a maintenance cycle claiming it was fixed. It still had the problem, so it went in the trash. (It probably had a memory problem that flipped a bit for this specific case or some such.) Two examples of how a network problem might affect NFS write performance, but not read performance. Write requests are the only large RPC messages sent from client->server. With a !Mbyte write size, each write results in about 700 1500byte TCP segments (for an ordinary ethernet packet size). -> If the burst of 700 packets causes one to be dropped on the server (receive) end sometimes... (Found by seeing an improvement with a smaller wsize.) -> If the client/sender has a TSO bug (the most common problem is mishandling a TSO segment that is slightly less than 64Kbyytes. (Found by disabling TSO in the client. Disabling TSO also changes the timing of the TCP segments and this can sometimes avoid bugs.) Have you yet tried a smaller rsize/wsize as I suggested. NFS traffic is also very different than typical TCP traffic. For example, both 13.0 and 13.1 shipped with bugs in the TCP stack that affected the NFS server (intermittent hangs in these cases). If it isn't a network fabric problem it is probably something related to ZFS. I know nothing about ZFS, so I can't even suggest anything beyond "sync=3Ddisabled". Since an NFS server uses both storage (hardware + ZFS) and networking, any breakage anywhere in these can cause a big performance hit. NFS itself just translates between the NFS RPC message and VFS/VOP calls. It is conceivable that some change in the NFS server is causing this, but these changes are few and others have not reported similar write performance problems for 14.0, so it seems unlikely. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-276299-3630-FHPjQURuqT>