Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Oct 2012 09:00:35 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Peter Maloney <peter.maloney@brockmann-consult.de>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: NFS Performance Help
Message-ID:  <507136025.1628950.1349269235306.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <506C3143.6060000@brockmann-consult.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Peter Maloney wrote:
> On 09/30/2012 01:50 AM, Rick Macklem wrote:
> > Wayne Hotmail wrote:
> >> Like others I am having issues getting any decent performance out
> >> of
> >> my NFS clients on FreeBSD.I have tried 8.3 and 9.1 beta on stand
> >> alone
> >> servers or as vmware clients. Used 1 Gig connections or a 10 Gig
> >> connection.Tried mounting using Version 3 and Version 4.I have
> >> tried
> >> the noatime, sync, and tcp options nothing seems to help.I am
> >> connecting to a IceWeb NAS. My performance with DD is 60 meg a
> >> second
> >> at best when writing to the server. If I load a Redhat Linux server
> >> on
> >> the same hardware using the same connection my write performance is
> >> about 340 Meg a second.
> >> It really falls apart when I run a test script where I create a 100
> >> folders then create a 100 files in the folders and append to these
> >> files 5 times using 5 random files. I am trying to simulate a IMAP
> >> email server. If I run the script on my local mirror drives it
> >> takes
> >> about a one minute and thirty seconds to complete. If I run the
> >> script
> >> on the NFS mounted drives it takes hours to complete. With my Linux
> >> install on the same hardware this NFS mounted script takes about 4
> >> minutes.
> >> Google is tired of me asking the same question over and over. So if
> >> anyone would be so kind as to point out some kernel or system
> >> tweaks
> >> to get me passed my NFS issues that would be greatly appreciated.
> >> Wayne
> >>
> > You could try a smaller rsize,wsize by setting the command line args
> > for the mount. In general, larger rsize,wsize should perform better,
> > but if a large write generates a burst of traffic that overloads
> > some part of the network fabric or server, such that packets get
> > dropped, performance will be hit big time.
> >
> > Other than that, if you capture packets and look at them in
> > wireshark, you might be able to spot where packets are getting lost
> > and retransmitted. (If packets are getting dropped, then the fun
> > part is figuring out why and coming up with a workaround.)
> >
> > Hopefully others will have more/better suggestions, rick
> >
> My only suggestion is to try (but not necessarily in production) the
> changes suggested in the thread "NFSv3, ZFS, 10GE performance" started
> by Sven Brandenburg. It didn't do much for my testing, but he says it
> does a bunch. However, he is using a Linux client, and a ram based ZIL
> (using ZFS).
> 
> Other than that, I can only say that I observed the same thing as you
> (testing both FreeBSD and Linux clients), but I always tested with
> ZFS.
> And I found that with FreeBSD, it was putting high load on the ZIL,
> meaning FreeBSD was using sync writes, but Linux was not. ESXi did the
> same thing as a client. So with a cheap SSD as a ZIL, ESXi and FreeBSD
> were writing at around 40-70 MB/s and Linux was writing at 600. The
> same
> test using a virtual machine disk mounted over NFS shows how extreme
> the
> problem can be, and was instead 7 MB/s with FreeBSD and ESXi, and
> around
> 90-200 with Linux. (And to compare 10Gbps performance with other
> non-NFS
> tests, I could get something like 600MB/s with a simple netcat from
> local RAM to remote /dev/null, and 800-900 with more threads or NICs,
> I
> don't remember).
> 
> I couldn't figure out for sure, but I couldn't cause any corruption in
> my testing, so I just assume Linux is only running "sync" calls when
> creating files, write barriers to virtual disks, etc. like it does
> with
> local file systems instead of doing every single write synchronously.
> 
I would not recommend that this go in a production system at this time,
but you could try the following patch on a test system to see if it
alleviates the problem. (If it never gets tested, we'll never know if it
works well and should be considered for a commit to head.:-)
  http://people.freebsd.org/~rmacklem/dirtybuflist.patch

The FreeBSD NFS clients (new one just cloned the code in the old one)
keeps track of what part of buffer cache block has been written to,
instead of always pre-reading in the block and then writing the whole
block back to the server. This provides more correct behaviour if
multiple clients are writing non-overlapping areas of the same block
concurrently and provides better performance in some situations.

Unfortunately, the current code only handles a single dirty byte region
in the block. As such, when a write of an area not contiguous with a
byte region that is already dirty/modified, the code writes the old
byte region to the server as a single synchronous write.

The above patch changes the code so that it maintains a list of dirty/modified
byte regions and avoids this problem.

jhb@ also had a simpler patch which avoided the synchronous write, but
it didn't preserve correct behaviour when multiple clients are concurrently
writing to non-overlapping areas of the same block. (I don't think I have
his patch handy at the moment, but maybe he does?)

rick

> Peter
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?507136025.1628950.1349269235306.JavaMail.root>