Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 Jan 2011 08:30:05 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-stable@freebsd.org, marek sal <marek_sal@wp.pl>, perryh@pluto.rain.com, milu@dat.pl, jyavenard@gmail.com
Subject:   Re: NFSv4 - how to set up at FreeBSD 8.1 ?
Message-ID:  <1870282066.118978.1294234205820.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <201101050757.08116.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> On Wednesday, January 05, 2011 5:55:53 am perryh@pluto.rain.com wrote:
> > Rick Macklem <rmacklem@uoguelph.ca> wrote:
> >
> > > ... one of the fundamental principals for NFSv2, 3 was a stateless
> > > server ...
> >
> > Only as long as UDP transport was used. Any NFS implementation that
> > used TCP for transport had thereby abandoned the stateless server
> > principle, since a TCP connection itself requires that state be
> > maintained on both ends.
> 
> Not filesystem cache coherency state, only socket state. And even NFS
> UDP
> mounts maintain their own set of "socket" state to manage retries and
> retransmits for UDP RPCs. The filesystem is equally incoherent for
> both UDP
> and TCP NFSv[23] mounts. TCP did not change any of that.
> 
Unfortunately even NFSv4 doesn't maintain cache coherency in general. The state it
maintains/recovers after a server crash are opens/locks/delegations, but
the opens are a Windows-like open share lock (can't remember the Windows/Samba
term for them) and not a POSIX-like open. NFSv4 does tie cache coherency to
file locking, so that clients will get a coherent view of file data for byte
ranges they lock.

The term stateless server refers to the fact that the server doesn't know anything
about the file handling state in the client that needs to be recovered after
a server crash (opens, locks, ...). When an NFSv2,3 server is rebooted, it
normally knows nothing about what clients are mounted, what clients have files
open, etc and just services RPCs as they come in. The design avoided the
complexity of recovery after a crash but results in a non-POSIX compliant
file system that can't do a good job of cache coherency, knows nothing about
file locks, etc. (Sun did add a separate file locking protocol called the
NLM or rpc.lockd if you prefer, but that protocol design was fundamentally
flawed imho and, as such, using it is in the "your mileage may vary" category.)

Further, since without any information about previous operations, retries of
non-idempotent RPCs would cause weird failures, "soft state" in the form of
a cache of recent RPCs (typically called the Duplicate Request Cache or DRC
these days) was added, to avoid performing the non-idempotent operation
twice. A server is not required to retain the contents of a DRC after a
crash/reboot but some vendors with non-volatile RAM hardware may choose to
do so in order to provide "closer to correct" behaviour after a server
crash/reboot.

rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1870282066.118978.1294234205820.JavaMail.root>