Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Dec 2016 00:12:24 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Don Lewis <truckman@FreeBSD.org>
Cc:        "cperciva@tarsnap.com" <cperciva@tarsnap.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: ESTALE after cwd deleted by same NFS client
Message-ID:  <YTXPR01MB0189BA6B72E77E1C1D16CE83DD900@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <201612200715.uBK7FAlL002072@gw.catspoiler.org>
References:  <YTXPR01MB0189B6B6869327FD70210018DD910@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM>, <201612200715.uBK7FAlL002072@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Don Lewis wrote:

On 19 Dec, Rick Macklem wrote:
> Colin Percival wrote:
>
>>On 12/16/16 12:14, Colin Percival wrote:
>>> making this change in nfs_lookup
>>>> --- sys/fs/nfsclient/nfs_clvnops.c      (revision 310132)
>>>> +++ sys/fs/nfsclient/nfs_clvnops.c      (working copy)
>>>> @@ -1144,7 +1144,7 @@
>>>>                         *vpp =3D NULLVP;
>>>>                 }
>>>>
>>>> -               if (error !=3D ENOENT) {
>>>> +               if (error !=3D ENOENT && error !=3D ESTALE) {
>>>>                         if (NFS_ISV4(dvp))
>>>>                                 error =3D nfscl_maperr(td, error, (uid=
_t)0,
>>>>                                     (gid_t)0);
>>>
>>> fixes the case I described above (for some definition of "fixes" -- I'm=
 not
>>> sure if this is the correct way of handling ESTALE here?) but I'm still=
 seeing
>>> ESTALEs from buildworld's cleandir so I think there must be some other =
place
>>> where something odd is happening.
>>
>>Further information: In addition to the "lookup relative to a directory w=
hich
>>has been deleted out from underneath us" case which causes ESTALE to land=
 in
>>nfs_lookup, the cleandir step of buildworld results in ESTALE being retur=
ned
>>by nfsrpc_getattr into nfs_getattr (landing ultimately in getcwd), and ES=
TALE
>>being returned by nfsrpc_accessrpc into nfs34_access_otw (landing ultimat=
ely
>>in stat and lstat).
>>
>>In UFS there are checks for effnlink =3D=3D 0 which result in e.g. ufs_lo=
okup
>>returning ENOENT; would it make sense to add NREMOVED to struct nfsnode.n=
_flag
>>and check this in the appropriate nfs_* calls?
> To be honest, I can't think of a reason why userland would ever want to s=
ee ESTALE?
> The function you see above "nfscl_maperr()" could easily map all ESTALEs =
to ENOENTs?
> - The question is: "Would returning ENOENT for stat(2) and access(2) actu=
ally make the
>    buildworld happy?
>    if Yes
>           - then mapping ESTALE->ENOENT makes sense for most/all VOP_xxx(=
) calls.
>    else
>            - I don't see any point in returning a different error and the=
re might be some
>              code out there somewhere that depends on seeing ESTALE (I do=
ubt it, but???).
>
> The real problem here is that the directory has a reference count on it w=
hen it is
> rmdir'd. POSIX file systems keep the data until the reference count goes =
to 0, but
> NFS isn't POSIX.
> --> The cheat for regular files is "sillyrename". This could be done for =
directories,
>       but there are multiple comments in the code (not put there by me) t=
hat say
>       "no sillyrename for directories".
>      #1 Does this imply something breaks when you do sillyrename for dirs=
?
>      OR
>       #2  Does it mean no one has bothered to implement it?
> Since implementing it would have been pretty easy, I have to suspect #1, =
which
> means I would be reluctant to do it, at least by default.
> --> Maybe I'll send you a patch that does sillyrename for dirs which you =
can test.
>       If it fixes buildworld, then it could be considered for head as a n=
on-default option?
>
>Sillyrename makes sense for files because you can continue to read and
>write to a file after it is unlinked, until it is closed and it totally
>vanishes.  It doesn't make sense for directories since you can't rmdir()
>a directory until it is empty.  Once it is rmdir()ed, then you can't
>create any new entries in the directory because that would mean that you
>are creating a disconnnected subtree in the filesystem.
Good points. I just tried this ("rm -r" while another process's home dir is=
 in the subtree)
and the process can still "ls" (gets no entries), but can't "cd .." because=
 that doesn't
exist and no process can "cd" into the subtree.
--> I can't think of an easy way to make this happen for the NFS client.

My hunch is that the failure in "buildworld" will just have to be a "sorry,=
 but an NFS
client isn't POSIX compliant, so this doesn't work" case.
Fyi, the only way to safely delete all entries in an NFS mounted dir is:
     opendir()
     while (readdir() !=3D EOF) {
          unlink()  the first dir entry acquired by the readdir()
          closedir()
          opendir()
     }
So that it is always doing a readdir()/unlink() of the first entry in the d=
irectory
until the directory is empty.
This has always been the case, so I am surprised that "rm -r subdir" usuall=
y works.;-)
- This is because there is no guarantee that directory offset cookies won't=
 change when
   an entry in the directory is removed.
   - It happens that UFS like file systems did have this property for a lon=
g time.
     (This changed for FreeBSD when r252438 was committed to head, which sk=
ipped
       over the d_ino =3D=3D 0 entries when copying out directories. Howeve=
r, UFS still
       retains the property that the directory offset is monotonically incr=
easing and
       the server can skip ahead to the next entry safely.)
- I have no idea whether or not ZFS directory offsets change when an entry =
is removed,
  but I do know that ZFS directory offsets are not monotonically increasing=
 values.

The handling of directories has always been a weak spot for NFS, rick








Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTXPR01MB0189BA6B72E77E1C1D16CE83DD900>