From owner-freebsd-fs@freebsd.org Thu Apr 5 04:39:06 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AAEF7F87D7B for ; Thu, 5 Apr 2018 04:39:06 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id EEE63874F5 for ; Thu, 5 Apr 2018 04:39:05 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 3B4F710474D; Thu, 5 Apr 2018 14:38:57 +1000 (AEST) Date: Thu, 5 Apr 2018 14:38:56 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Kaya Saman cc: Mike Tancsa , FreeBSD Filesystems Subject: Re: Linux NFS client and FreeBSD server strangeness In-Reply-To: Message-ID: <20180405134730.V1123@besplex.bde.org> References: <369fab06-6213-ba87-cc66-c9829e8a76a0@sentex.net> <2019ee5a-5b2b-853d-98c5-a365940d93b5@madpilot.net> <4da08f8b-2c28-cf18-77d2-6b498004d435@gmail.com> <2937ffcc-6b47-91af-8745-2117006660db@sentex.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=VJytp5HX c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=H-qYfXVrYLlSfQt42DAA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Apr 2018 04:39:06 -0000 On Wed, 4 Apr 2018, Kaya Saman wrote: > If I recall correctly the "sync" option is default. Though it might be > different depending on the Linux distro in use? > > I use this: vers=3,defaults,auto,tcp,rsize=8192,wsize=8192 > > though I could get rid of the tcp as that's also a default. The rsize > and wsize options are for running Jumbo Frames ie. larger MTU then > 1500; in my case 9000 for 1Gbps links. These rsize and wsize options are pessimizations. They override the default sizes which are usually much larger for tcp. The defaults are not documented in the man page, and the current settings are almost equally impossible to see (e.g., mount -v doesn't show them). The defaults are not quite impossible to see in the source code of course, but the source code for them is especially convoluted. It seems to give the following results: - for udp, the initial defaults are NFS_RSIZE and NFS_WSIZE. These are 8K. This is almost simple. The values are low because at least some versions have bugs with larger values. 32K never worked well for me. It hangs in some versions and is slower in others. 16K works for me. - for tcp, the initial defaults are maxbcachebuf. This is a read-only tunable. It defaults to MAXBCACHEBUF. This is an unsupported option. It defaults to MAXBSIZE. MAXBSIZE is honestly non-optional -- it is always 64K unless the source code is edited. Unsupported for an options means that it is ifdefed in a header file but is not in conf/options*, so it must be added to CFLAGS in some way before every include of the file (or just the ones that use MAXBCACHE if you know what they are. MAXBCACHE has a maximum of MAXPHYS. MAXPHYS is a supported option with a default of 128K. Although it is supported, it is much harder to change since it can reasonably used in applications where the support is null. - for remount, the defaults are from the previous mount. Their values are almost impossible to see. Large values from a previous tcp mount tend to break remounting with udp. I always use udp since tcp is just slower (higher overhead and latency). I usually force the sizes to 8K since I don't want then to depend on undocumented defaults or change if these defaults change. Changes invalidate benchmarks. Sometimes I force them to 16K to see if this is an optimization and keep it and update the benchmark results if it is. I don't like large block sizes and mostly use 16K for all file systems, but 32K is now better for throughput. 64K is just slower in all of my tests, due to it being too large for small metadata. nfs (v3) used to (5-30 years ago) have bursty behaviour even with a FreeBSD client and server. IIRC, this was from not much write combining and too many daemons on the server. (You will have to mount the client async to give the server a chance to combine small writes. Large block sizes give large writes which may arrive out of order so need recombining, and waiting for this in the sync case. Too many daemons give more reordering.) The server couldn't keep up with the client, and the client stopped to let the server catch up. After fixing this the problem moved to the client not being able to keep up with disks (for copying uncached files). This gave less bursty behaviour. E.g., consistently 10-20% below the disk bandwidth with network bandwidth to spare. Bruce