Date: Sun, 3 Jan 2016 20:37:13 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: "Mikhail T." <mi+thun@aldan.algebra.com> Cc: Karli =?utf-8?Q?Sj=C3=B6berg?= <karli.sjoberg@slu.se>, freebsd-fs@FreeBSD.org Subject: Re: NFS reads vs. writes Message-ID: <495055121.147587416.1451871433217.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <5688D3C1.90301@aldan.algebra.com> References: <8291bb85-bd01-4c8c-80f7-2adcf9947366@email.android.com> <5688D3C1.90301@aldan.algebra.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Mikhail T. wrote: > On 03.01.2016 02:16, Karli Sj=C3=B6berg wrote: > > > > The difference between "mount" and "mount -o async" should tell you if > > you'd benefit from a separate log device in the pool. > > > This is not a ZFS problem. The same filesystem is being read in both > cases. The same data is being read from and written to the same > filesystems. For some reason, it is much faster to read via NFS than to > write to it, however. >=20 This issue isn't new. It showed up when Sun introduced NFS in 1985. NFSv3 did change things a little, by allowing UNSTABLE writes. Here's what an NFSv3 or NFSv4 client does when writing: - Issues some # of UNSTABLE writes. The server need only have these is serv= er RAM before replying NFS_OK. - Then the client does a Commit. At this point the NFS server is required t= o store all the data written in the above writes and related metadata on st= able storage before replying NFS_OK. --> This is where the "sync" vs "async" is a big issue. If you use "sync= =3Ddisabled" (I'm not a ZFS guy, but I think that is what the ZFS option looks lik= es) you *break* the NFS protocol (ie. violate the RFC) and put your data at s= ome risk, but you will typically get better (often much better) write performan= ce. OR You put a ZIL on a dedicated device with fast write performance, so t= he data can go there to satisfy the stable storage requirement. (I know nothi= ng about them, but SSDs have dramatically different write performance, s= o an SSD to be used for a ZIL must be carefully selected to ensure good write = performance.) How many writes are in "some #" is up to the client. For FreeBSD clients, t= he "wcommitsize" mount option can be used to adjust this. Recently the default tuning of thi= s changed significantly, but you didn't mention how recent your system(s) are, so man= ual tuning of it may be useful. (See "man mount_nfs" for more on this.) Also, the NFS server was recently tweaked so that it could handle 128K rsiz= e/wsize, but the FreeBSD client is limited to MAXBSIZE and this has not been increas= ed beyond 64K. To do so, you have to change the value of this in the kernel so= urces and rebuild your kernel. (The problem is that increasing MAXBSIZE makes the= kernel use more KVM for the buffer cache and if a system isn't doing significant c= lient side NFS, this is wasted.) Someday, I should see if MAXBSIZE can be made a TUNABLE, but I haven't done= that. --> As such, unless you use a Linux NFS client, the reads/writes will be 64= K, whereas 128K would work better for ZFS. Some NAS hardware vendors solve this problem by using non-volatile RAM, but= that isn't available in generic hardware. > And finally, just to put the matter to rest, both ZFS-pools already have > a separate zil-device (on an SSD). >=20 If this SSD is dedicated to the ZIL and is one known to have good write per= formance, it should help, but in your case the SSD seems to be the bottleneck. rick > -mi >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?495055121.147587416.1451871433217.JavaMail.zimbra>