Date: Sun, 3 Jan 2016 20:37:13 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: "Mikhail T." <mi+thun@aldan.algebra.com> Cc: Karli =?utf-8?Q?Sj=C3=B6berg?= <karli.sjoberg@slu.se>, freebsd-fs@FreeBSD.org Subject: Re: NFS reads vs. writes Message-ID: <495055121.147587416.1451871433217.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <5688D3C1.90301@aldan.algebra.com> References: <8291bb85-bd01-4c8c-80f7-2adcf9947366@email.android.com> <5688D3C1.90301@aldan.algebra.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Mikhail T. wrote:
> On 03.01.2016 02:16, Karli Sj=C3=B6berg wrote:
> >
> > The difference between "mount" and "mount -o async" should tell you if
> > you'd benefit from a separate log device in the pool.
> >
> This is not a ZFS problem. The same filesystem is being read in both
> cases. The same data is being read from and written to the same
> filesystems. For some reason, it is much faster to read via NFS than to
> write to it, however.
>=20
This issue isn't new. It showed up when Sun introduced NFS in 1985.
NFSv3 did change things a little, by allowing UNSTABLE writes.
Here's what an NFSv3 or NFSv4 client does when writing:
- Issues some # of UNSTABLE writes. The server need only have these is serv=
er
RAM before replying NFS_OK.
- Then the client does a Commit. At this point the NFS server is required t=
o
store all the data written in the above writes and related metadata on st=
able
storage before replying NFS_OK.
--> This is where the "sync" vs "async" is a big issue. If you use "sync=
=3Ddisabled"
(I'm not a ZFS guy, but I think that is what the ZFS option looks lik=
es) you
*break* the NFS protocol (ie. violate the RFC) and put your data at s=
ome risk,
but you will typically get better (often much better) write performan=
ce.
OR
You put a ZIL on a dedicated device with fast write performance, so t=
he data
can go there to satisfy the stable storage requirement. (I know nothi=
ng
about them, but SSDs have dramatically different write performance, s=
o an SSD
to be used for a ZIL must be carefully selected to ensure good write =
performance.)
How many writes are in "some #" is up to the client. For FreeBSD clients, t=
he "wcommitsize"
mount option can be used to adjust this. Recently the default tuning of thi=
s changed
significantly, but you didn't mention how recent your system(s) are, so man=
ual tuning of
it may be useful. (See "man mount_nfs" for more on this.)
Also, the NFS server was recently tweaked so that it could handle 128K rsiz=
e/wsize,
but the FreeBSD client is limited to MAXBSIZE and this has not been increas=
ed
beyond 64K. To do so, you have to change the value of this in the kernel so=
urces
and rebuild your kernel. (The problem is that increasing MAXBSIZE makes the=
kernel
use more KVM for the buffer cache and if a system isn't doing significant c=
lient
side NFS, this is wasted.)
Someday, I should see if MAXBSIZE can be made a TUNABLE, but I haven't done=
that.
--> As such, unless you use a Linux NFS client, the reads/writes will be 64=
K, whereas
128K would work better for ZFS.
Some NAS hardware vendors solve this problem by using non-volatile RAM, but=
that
isn't available in generic hardware.
> And finally, just to put the matter to rest, both ZFS-pools already have
> a separate zil-device (on an SSD).
>=20
If this SSD is dedicated to the ZIL and is one known to have good write per=
formance,
it should help, but in your case the SSD seems to be the bottleneck.
rick
> -mi
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?495055121.147587416.1451871433217.JavaMail.zimbra>
