Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Oct 2014 20:43:33 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Garrett Wollman <wollman@csail.mit.edu>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: 9.3 NFS client bug?
Message-ID:  <1713580100.60978206.1412815413928.JavaMail.root@uoguelph.ca>
In-Reply-To: <21557.22365.961980.709081@khavrinen.csail.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Garrett Wollman wrote:
> <<On Tue, 7 Oct 2014 21:20:41 -0400 (EDT), Rick Macklem
> <rmacklem@uoguelph.ca> said:
> 
> > As far as I know, this has never worked correctly for FreeBSD. The
> > unlink() invalidates the directory offset cookies and then it has
> > trouble finding the next entry.
> > To make the above loop work correctly for FreeBSD, it needs to be
> > re-written to start at the beginning of the directory after each
> > unlink().
> 
> How about instead we fix FreeBSD to work properly?  Clearly it is not
> impossible since the Linux NFS client does work.  What exactly is the
> issue?  (Forgive me, I know very little about how VOP_READDIR works
> under the hood.)
> 
> -GAWollman
> 
> 
Well, I've never looked at Linux or OpenSolaris to see how they handle
these things, but here's a couple of ways I am aware of that could fix this:
1 - There is a "cookie_verifier" defined for NFS, which is a 64bit value that
    is supposed to change whenever the directory offset cookies are no longer
    valid.
    Implementing this requires something like:
    - add an attribute or new VOP_xxx() for this cookie_verfier
    - fix every file system to handle it
      --> This requires a good knowledge of the underlying file system, since
          it needs to change when the directory_offset_cookies are stale
          (I believe this is when objects are added to a directory for UFS.
           Have no idea for ZFS, etc.)
          - It needs to be stored on-disk (in the i-node or similar) since it
            is supposed to survive server crashes.
    The value for this cookie_verifier is in the readdir reply and then the
    client sends it in subsequent requests so that the server can reply with
    an error if the cookie_verifier refers to "stale" directory offset cookies.
--> Unfortunately some servers haven't supported this correctly for a long time and
    it is difficult for clients to recover from the error. RFC-3530 strongly
    recommends that directory offset cookies not be allowed to become stale,
    but I don't know how to do this for UFS, ZFS, ...
    (There is still an ancient comment in the server code about the check
     being too strict for Solaris 2.5 clients. When was Solaris 2.5 released?;-)
All in all, a mess.
As such, the FreeBSD client assumes that the cookies are no longer valid when
it sees the modify time on the directory change (guess what happens every time
an entry is unlink'd from the directory).
Unless not only the FreeBSD servers but most/all other servers (a lot of old
BSD servers and I believe others are broken) are fixed, the client can't really
depend on this to determine if directory offset cookies are still valid.
(Can you now see why this has never been fixed?)

2 - Have readdir(3) do what fts(3) does. Read the entire directory into the
    user address space on the first readdir() after opendir() and then
    subsequent readdir() calls just return directory entries from memory
    (avoiding further getdirentries(2) calls that don't work correctly because
     of potentially stale directory offset cookies).
    --> This one is probably straightforward, but may eat a lot of address space
        for apps. that opendir(), readdir() a lot of large directories.
This might be fine or it might break a bunch of apps that run out of address
space and do more harm than the broken case of removing entries in a readdir()
loop (which can be easily coded around)?

If someone else knows of a better way to fix this (maybe what Linux or Solaris does)
please post, because I don't think either 1 or 2 above is a good plan.

rick
ps: This is my recollection of the problem, but I haven't worked on it in
    quite a while.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1713580100.60978206.1412815413928.JavaMail.root>