Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Jan 2014 20:02:56 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        J David <j.david.lists@gmail.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID:  <635382404.16057591.1390611776054.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RTTCfxP_Ebp3aa4k9qr5QrGDVQQMr1R1w0wBTUBD1OtwA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
J David wrote:
> On Fri, Jan 24, 2014 at 5:54 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> > But disabling it will identify if that is causing the problem. And
> > it
> > is a workaround that often helps people get things to work. (With
> > real
> > hardware, there may be no way to "fix" such things, depending on
> > the
> > chipset, etc.)
> 
> There are two problems that are crippling NFS performance with large
> block sizes.
> 
Btw, just in case it wasn't obvious, I would like to see large (at least
as large as the server's file system block size) reads/writes going to
the server.

For example, here is a write for your Linux case (from you previosu post):
172.20.20.166.2036438470 > 172.20.20.162.2049: 2892 write fh
1325,752613/4 4096 (4096) bytes @ 824033280 <filesync>

This is writing 4096 bytes with filesync. filesync means that the data and
metadata must be written to stable storage before the NFS server replies, so
that the write won't be lost if the server crashes just after sending the
reply.

Now, unlike your test case, a typical real NFS server will be using disks
as stable storage and doing multiple writes to disk for each of these will
take a long time.
Like most disk file systems, doing fewer writes of large blocks will make
a big difference for NFS server performance. (Your test does the unrealistic
case of putting the file system in memory.)

Now, I would agree that I would like to see 64Kbyte rsize/wsize work well
with the underlying network fabric, but I don't know how to do that, in
general. I would actually like to see MAXBSIZE increase to 128K, so that
can be the default rsize/wsize. (I've been told that 128K is the blocksize
used by ZFS typically. I know nothing about ZFS, but I think the person
that emailed this knows ZFS pretty well.)

This comes back to my suggestion of testing with "-r 32k", since that
seems to be closed to what would be desirable for a real NFS server.
(But, if you have a major application that loves to do 4k reads/writes,
 then I understand why you would use "-r 4k".)

rick

> One is the extraneous NFS read-on-write issue I documented earlier
> today that has nothing to do with network topology or packet size.
> You might have more interest in that one.
> 
> This other thing is a five-way negative interaction between 64k NFS,
> TSO, LRO, delayed ack, and congestion control.  Disabling *any* one
> of
> them is sufficient to see significant improvement, but does not serve
> to identify that it is causing the problem since it is not a unique
> characterstic.  (Even if it was, that would not determine whether a
> problem was with component X or with component Y's ability to
> interact
> with component X.)  Figuring out what's really happening has proven
> very difficult for me, largely due to my limited knowledge of these
> areas.  And the learning curve on the TCP code is pretty steep.
> 
> The "simple" explanation appears to be that NFS generates two
> packets,
> one just under 64k and one containing "the rest" and the alternating
> sizes prevent the delayed ack code from ever seeing two full-size
> segments in a row, so traffic gets pinned down to one packet per
> net.inet.tcp.delacktime (100ms default), for 10pps, as observed
> earlier.  But unfortunately, like a lot of simple explanations, this
> one appears to have the disadvantage of being more or less completely
> wrong.
> 
> > ps: If you had looked at the link I had in the email, you would
> > have
> >     seen that he gets very good performance once he disables TSO.
> >     As
> >     they say, your mileage may vary.
> 
> Pretty much every word written on this subject has come across my
> screens at this point.  "Very good performance" is relative.  Yes,
> you
> can get about 10-20x better performance by disabling TSO, at the
> expense of using vastly more CPU.  Which is definitely a big
> improvement, and may be sufficient for many applications.  But in
> absolute terms, the overall performance and particularly the
> efficiency remains unsatisfactory.
> 
> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?635382404.16057591.1390611776054.JavaMail.root>