Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Mar 2009 01:09:47 -0700
From:      perryh@pluto.rain.com
To:        kheuer2@gwdg.de
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Is NFS Locking Reliable?
Message-ID:  <49b771cb.daisuixe%2Bbdr3tV8%perryh@pluto.rain.com>
In-Reply-To: <20090310091318.W34669@gwdu60.gwdg.de>
References:  <20090310091318.W34669@gwdu60.gwdg.de>

next in thread | previous in thread | raw e-mail | index | archive | help
> Our NFS servers for user home directories are on FreeBSD (6.4),
> MacOSX (10.5), Linux (still 2.4 kernel) and Tru64-UNIX boxes; NFS
> clients are mostly Linux (2.6 kernel) and FreeBSD (6.4, 7.0, but
> w/o kernel lockd) systems.

I have seen problems with NFS locking even in completely homogeneous
environments.  With a mix like that, I would not trust it as far as
I could throw a Cray :)

> There are periods of several days without problems, but from time
> to time, on one, two, or several (but not all) clients application
> processes which use locking suddenly hang in kernel mode - namely
> firefox, opera, pine.

Lockups are probably the least of your concerns, at least where
pine is involved.  Dunno what sort of data firefox and opera are
protecting from race conditions, but I suppose pine is being used
for email.  Cases will arise wherein mail mysteriously disappears,
because the client and the delivery agent were both updating the
inbox at the same time.  Often there will be no noticeable symptoms,
except for users wondering what happened to that important message
they were supposed to have gotten (and which the MTA log shows was
in fact delivered).

Never export an inbox read/write if reliability of mail delivery is
needed.  Use IMAP instead.

> It seems to be no specific operating system problem - all
> combinations of clients and servers are involved.

I suspect the reason NFS locking is so troublesome is that it
presents problems which are fundamentally incomputable.  Prior
to restoration of communication, how can any automaton possibly
distinguish between

* a temporary loss of the communication link (but the peer is still
  running and the link will eventually be re-established), and

* the peer has crashed, and will eventually reboot?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49b771cb.daisuixe%2Bbdr3tV8%perryh>