Date: Tue, 20 Dec 2016 19:25:31 -0800 (PST) From: Don Lewis <truckman@FreeBSD.org> To: rmacklem@uoguelph.ca Cc: cperciva@tarsnap.com, freebsd-fs@freebsd.org Subject: Re: ESTALE after cwd deleted by same NFS client Message-ID: <201612210325.uBL3PVtg006345@gw.catspoiler.org> In-Reply-To: <YTXPR01MB0189BA6B72E77E1C1D16CE83DD900@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
On 21 Dec, Rick Macklem wrote: > Don Lewis wrote: > > On 19 Dec, Rick Macklem wrote: >> Colin Percival wrote: >> >>>On 12/16/16 12:14, Colin Percival wrote: >>>> making this change in nfs_lookup >>>>> --- sys/fs/nfsclient/nfs_clvnops.c (revision 310132) >>>>> +++ sys/fs/nfsclient/nfs_clvnops.c (working copy) >>>>> @@ -1144,7 +1144,7 @@ >>>>> *vpp = NULLVP; >>>>> } >>>>> >>>>> - if (error != ENOENT) { >>>>> + if (error != ENOENT && error != ESTALE) { >>>>> if (NFS_ISV4(dvp)) >>>>> error = nfscl_maperr(td, error, (uid_t)0, >>>>> (gid_t)0); >>>> >>>> fixes the case I described above (for some definition of "fixes" -- I'm not >>>> sure if this is the correct way of handling ESTALE here?) but I'm still seeing >>>> ESTALEs from buildworld's cleandir so I think there must be some other place >>>> where something odd is happening. >>> >>>Further information: In addition to the "lookup relative to a directory which >>>has been deleted out from underneath us" case which causes ESTALE to land in >>>nfs_lookup, the cleandir step of buildworld results in ESTALE being returned >>>by nfsrpc_getattr into nfs_getattr (landing ultimately in getcwd), and ESTALE >>>being returned by nfsrpc_accessrpc into nfs34_access_otw (landing ultimately >>>in stat and lstat). >>> >>>In UFS there are checks for effnlink == 0 which result in e.g. ufs_lookup >>>returning ENOENT; would it make sense to add NREMOVED to struct nfsnode.n_flag >>>and check this in the appropriate nfs_* calls? >> To be honest, I can't think of a reason why userland would ever want to see ESTALE? >> The function you see above "nfscl_maperr()" could easily map all ESTALEs to ENOENTs? >> - The question is: "Would returning ENOENT for stat(2) and access(2) actually make the >> buildworld happy? >> if Yes >> - then mapping ESTALE->ENOENT makes sense for most/all VOP_xxx() calls. >> else >> - I don't see any point in returning a different error and there might be some >> code out there somewhere that depends on seeing ESTALE (I doubt it, but???). >> >> The real problem here is that the directory has a reference count on it when it is >> rmdir'd. POSIX file systems keep the data until the reference count goes to 0, but >> NFS isn't POSIX. >> --> The cheat for regular files is "sillyrename". This could be done for directories, >> but there are multiple comments in the code (not put there by me) that say >> "no sillyrename for directories". >> #1 Does this imply something breaks when you do sillyrename for dirs? >> OR >> #2 Does it mean no one has bothered to implement it? >> Since implementing it would have been pretty easy, I have to suspect #1, which >> means I would be reluctant to do it, at least by default. >> --> Maybe I'll send you a patch that does sillyrename for dirs which you can test. >> If it fixes buildworld, then it could be considered for head as a non-default option? >> >>Sillyrename makes sense for files because you can continue to read and >>write to a file after it is unlinked, until it is closed and it totally >>vanishes. It doesn't make sense for directories since you can't rmdir() >>a directory until it is empty. Once it is rmdir()ed, then you can't >>create any new entries in the directory because that would mean that you >>are creating a disconnnected subtree in the filesystem. > Good points. I just tried this ("rm -r" while another process's home dir is in the subtree) > and the process can still "ls" (gets no entries), but can't "cd .." because that doesn't > exist and no process can "cd" into the subtree. > --> I can't think of an easy way to make this happen for the NFS client. It sort of seems like this should be handled at the vfs level. Once rmdir() succeeds, there should be no calls to the underlying fs code. Maybe add a deleted flag to the vnode ... Dunno how ufs and zfs handle this, but the right thing happens: %mkdir /tmp/barf1 %cd /tmp/barf1 %rmdir /tmp/barf1 %touch barf2 touch: barf2: No such file or directory %ls
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201612210325.uBL3PVtg006345>