Date: Wed, 21 Dec 2016 00:12:24 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Don Lewis <truckman@FreeBSD.org> Cc: "cperciva@tarsnap.com" <cperciva@tarsnap.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: ESTALE after cwd deleted by same NFS client Message-ID: <YTXPR01MB0189BA6B72E77E1C1D16CE83DD900@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <201612200715.uBK7FAlL002072@gw.catspoiler.org> References: <YTXPR01MB0189B6B6869327FD70210018DD910@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM>, <201612200715.uBK7FAlL002072@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Don Lewis wrote: On 19 Dec, Rick Macklem wrote: > Colin Percival wrote: > >>On 12/16/16 12:14, Colin Percival wrote: >>> making this change in nfs_lookup >>>> --- sys/fs/nfsclient/nfs_clvnops.c (revision 310132) >>>> +++ sys/fs/nfsclient/nfs_clvnops.c (working copy) >>>> @@ -1144,7 +1144,7 @@ >>>> *vpp =3D NULLVP; >>>> } >>>> >>>> - if (error !=3D ENOENT) { >>>> + if (error !=3D ENOENT && error !=3D ESTALE) { >>>> if (NFS_ISV4(dvp)) >>>> error =3D nfscl_maperr(td, error, (uid= _t)0, >>>> (gid_t)0); >>> >>> fixes the case I described above (for some definition of "fixes" -- I'm= not >>> sure if this is the correct way of handling ESTALE here?) but I'm still= seeing >>> ESTALEs from buildworld's cleandir so I think there must be some other = place >>> where something odd is happening. >> >>Further information: In addition to the "lookup relative to a directory w= hich >>has been deleted out from underneath us" case which causes ESTALE to land= in >>nfs_lookup, the cleandir step of buildworld results in ESTALE being retur= ned >>by nfsrpc_getattr into nfs_getattr (landing ultimately in getcwd), and ES= TALE >>being returned by nfsrpc_accessrpc into nfs34_access_otw (landing ultimat= ely >>in stat and lstat). >> >>In UFS there are checks for effnlink =3D=3D 0 which result in e.g. ufs_lo= okup >>returning ENOENT; would it make sense to add NREMOVED to struct nfsnode.n= _flag >>and check this in the appropriate nfs_* calls? > To be honest, I can't think of a reason why userland would ever want to s= ee ESTALE? > The function you see above "nfscl_maperr()" could easily map all ESTALEs = to ENOENTs? > - The question is: "Would returning ENOENT for stat(2) and access(2) actu= ally make the > buildworld happy? > if Yes > - then mapping ESTALE->ENOENT makes sense for most/all VOP_xxx(= ) calls. > else > - I don't see any point in returning a different error and the= re might be some > code out there somewhere that depends on seeing ESTALE (I do= ubt it, but???). > > The real problem here is that the directory has a reference count on it w= hen it is > rmdir'd. POSIX file systems keep the data until the reference count goes = to 0, but > NFS isn't POSIX. > --> The cheat for regular files is "sillyrename". This could be done for = directories, > but there are multiple comments in the code (not put there by me) t= hat say > "no sillyrename for directories". > #1 Does this imply something breaks when you do sillyrename for dirs= ? > OR > #2 Does it mean no one has bothered to implement it? > Since implementing it would have been pretty easy, I have to suspect #1, = which > means I would be reluctant to do it, at least by default. > --> Maybe I'll send you a patch that does sillyrename for dirs which you = can test. > If it fixes buildworld, then it could be considered for head as a n= on-default option? > >Sillyrename makes sense for files because you can continue to read and >write to a file after it is unlinked, until it is closed and it totally >vanishes. It doesn't make sense for directories since you can't rmdir() >a directory until it is empty. Once it is rmdir()ed, then you can't >create any new entries in the directory because that would mean that you >are creating a disconnnected subtree in the filesystem. Good points. I just tried this ("rm -r" while another process's home dir is= in the subtree) and the process can still "ls" (gets no entries), but can't "cd .." because= that doesn't exist and no process can "cd" into the subtree. --> I can't think of an easy way to make this happen for the NFS client. My hunch is that the failure in "buildworld" will just have to be a "sorry,= but an NFS client isn't POSIX compliant, so this doesn't work" case. Fyi, the only way to safely delete all entries in an NFS mounted dir is: opendir() while (readdir() !=3D EOF) { unlink() the first dir entry acquired by the readdir() closedir() opendir() } So that it is always doing a readdir()/unlink() of the first entry in the d= irectory until the directory is empty. This has always been the case, so I am surprised that "rm -r subdir" usuall= y works.;-) - This is because there is no guarantee that directory offset cookies won't= change when an entry in the directory is removed. - It happens that UFS like file systems did have this property for a lon= g time. (This changed for FreeBSD when r252438 was committed to head, which sk= ipped over the d_ino =3D=3D 0 entries when copying out directories. Howeve= r, UFS still retains the property that the directory offset is monotonically incr= easing and the server can skip ahead to the next entry safely.) - I have no idea whether or not ZFS directory offsets change when an entry = is removed, but I do know that ZFS directory offsets are not monotonically increasing= values. The handling of directories has always been a weak spot for NFS, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTXPR01MB0189BA6B72E77E1C1D16CE83DD900>