Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 May 2012 07:37:31 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: NFS - slow
Message-ID:  <482299836.184445.1335872251190.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <alpine.BSF.2.00.1205010700240.5909@wojtek.tensor.gdynia.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
Wojciech Puchar wrote:
> i tried nfsv4, tested under FreeBSD over localhost and it is roughly
> the
> same. am i doing something wrong?
> 
Probably not. NFSv4 writes are done exactly the same as NFSv3. (It changes
other stuff, like locking, adding support for ACLs, etc.) I do have a patch
that allows the client to do more extension caching to local disk in the
client (called Packrats), but that isn't ready for prime time yet.

NFSv4.1 optionally supports pNFS, where reading and writing can be done
to Data Servers (DS) separate from the NFS (called Metadata Server or MDS).
I`m working on the client side of this, but it is also a work-in-progress
and no work on a NFSv4.1 server for FreeBSD has been done yet, as far as I know.

If you have increased MAXBSIZE in both the client and server machines and
use the new (experimental in 8.x) client and server, they will use a larger
rsize, wsize for NFSv3 as well as NFSv4. (Capturing packets and looking at them
in wireshark will tell you what the actual rsize, wsize is. A patch to nfsstat
to get the actual mount options in use is another of my `to do`items. If
anyone else wants to work on this, I`d be happy to help them.

> On Mon, 30 Apr 2012, Peter Jeremy wrote:
> 
> > On 2012-Apr-27 22:05:42 +0200, Wojciech Puchar
> > <wojtek@wojtek.tensor.gdynia.pl> wrote:
> >> is there any way to speed up NFS server?
> > ...
> >> - write works terribly. it performs sync on every write IMHO,
> >
> > You don't mention which NFS server or NFS version you are using but
> > for "traditional" NFS, this is by design. The NFS server is
> > stateless
> > and NFS server failures are transparent (other than time-wise) to
> > the
> > client. This means that once the server acknowledges a write, it
> > guarantees the client will be able to later retrieve that data, even
> > if the server crashes. This implies that the server needs to do a
> > synchronous write to disk before it can return the acknowledgement
> > back to the client.
> >
> > --
> > Peter Jeremy
> >
Btw, For NFSv3 and 4, the story is slightly different than the above.

A client can do writes with a flag that is either FILESYNC or UNSTABLE.
For FILESYNC, the server must do exactly what the above says. That is,
the data and any required metadata changes, must be on stable storage
before the server replies to the RPC.
For UNSTABLE, the server can simply save the data in memory and reply OK
to the RPC. For this case, the client needs to do a separate Commit RPC
later and the server must store the data on stable storage at that time.
(For this case, the client needs to keep the data written UNSTABLE in its
 cache and be prepared to re-write it, if the server reboots before the
 Commit RPC is done.)
- When any app. does a fsync(2), the client needs to do a Commit RPC
  if it has been doing UNSTABLE writes.

Most clients, including FreeBSD, do writes with UNSTABLE. However, one
limitation on the FreeBSD client is that it currently only keeps track
of one contiguous modified byte range in a buffer cache block. When an
app. in the client does non-contiguous writes to the same buffer cache
block, it must write the old modified byte range to the server with FILESYNC
before it copies the newly written data into the buffer cache block. This
happens frequently for builds during the loader phase. (jhb and I have
looked at this. I have an experimental patch that makes the modified byte
range a list, but it requires changes to struct buf. I think it is worth
persuing. It is a client side patch, since that is where things can be
improved, if clients avoid doing FILESYNC or frequent Commit RPCs.)

rick
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?482299836.184445.1335872251190.JavaMail.root>