From owner-freebsd-fs@freebsd.org Wed Dec 21 00:28:10 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 616A8C8ABB4 for ; Wed, 21 Dec 2016 00:28:10 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03on0087.outbound.protection.outlook.com [104.47.42.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1A62B1764; Wed, 21 Dec 2016 00:28:09 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM (10.165.218.133) by YTXPR01MB0192.CANPRD01.PROD.OUTLOOK.COM (10.165.218.136) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.789.14; Wed, 21 Dec 2016 00:12:24 +0000 Received: from YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM ([10.165.218.133]) by YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM ([10.165.218.133]) with mapi id 15.01.0789.018; Wed, 21 Dec 2016 00:12:24 +0000 From: Rick Macklem To: Don Lewis CC: "cperciva@tarsnap.com" , "freebsd-fs@freebsd.org" Subject: Re: ESTALE after cwd deleted by same NFS client Thread-Topic: ESTALE after cwd deleted by same NFS client Thread-Index: AQHSVANFVKFixagVOUGZ8mLDXQwbl6EDzQuAgAC4aM2AAVLGrIABN7mAgABdEGCAAjtv0oAFKIhagAEJ9wSAAJ6iAIABFdOM Date: Wed, 21 Dec 2016 00:12:24 +0000 Message-ID: References: , <201612200715.uBK7FAlL002072@gw.catspoiler.org> In-Reply-To: <201612200715.uBK7FAlL002072@gw.catspoiler.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-office365-filtering-correlation-id: 05ef0d6a-fd79-4a86-6336-08d429360a62 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:YTXPR01MB0192; x-microsoft-exchange-diagnostics: 1; YTXPR01MB0192; 7:fTWlbPi1CbaetINBOMe77y1P5qzCEnJGeSXSmtdAIlcAKRcwLWxFWsQJeBrbf2pkUTslZQPz/bWLwDuV0c1rV5Y+2OrMqYI3rxk5OK3ixQmXQuENXmmvaKqxqvYkrwJOaEZNFYLFltupv6ChOCCd50Wi99/9ApIJaWXKBtbwt5XwekbCQS13a3LG72olwiEl8VqEheJBnOm+3w+JPyDo8DRLWav1HTYzu11vm3SEi5nrQsQZHmDafhE/8z9SbDuqXHgsLACr9bAVtYB2uXFfE1FG8VU+njNyyeWkysm3AtUZqn/RmQO+oZ9Al4MBUKnOsqSVTXwWfQxByrpP4KDdkHbzvS1Z568Ez4OipEdKUX4wuJyaHMvdiTgQL87UuUs1v1+93zouOVM0Gizcr6SieDVBbNrgUFf14XLdSeddnsOMEfX73P/izLfx893tHG4g4cUOjesJVYsX0WWdP3NTRQ== x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040375)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6041248)(20161123564025)(20161123562025)(20161123555025)(20161123560025)(6072148); SRVR:YTXPR01MB0192; BCL:0; PCL:0; RULEID:; SRVR:YTXPR01MB0192; x-forefront-prvs: 01630974C0 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(39450400003)(189002)(24454002)(199003)(105586002)(106116001)(97736004)(74482002)(189998001)(122556002)(54356999)(7696004)(110136003)(2950100002)(33656002)(50986999)(76176999)(5660300001)(86362001)(101416001)(8676002)(74316002)(6916009)(92566002)(8936002)(2900100001)(4326007)(305945005)(106356001)(229853002)(38730400001)(9686002)(102836003)(81156014)(3280700002)(6506006)(3660700001)(2906002)(68736007)(77096006)(6436002)(81166006)(21314002); DIR:OUT; SFP:1101; SCL:1; SRVR:YTXPR01MB0192; H:YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Dec 2016 00:12:24.2993 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTXPR01MB0192 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Dec 2016 00:28:10 -0000 Don Lewis wrote: On 19 Dec, Rick Macklem wrote: > Colin Percival wrote: > >>On 12/16/16 12:14, Colin Percival wrote: >>> making this change in nfs_lookup >>>> --- sys/fs/nfsclient/nfs_clvnops.c (revision 310132) >>>> +++ sys/fs/nfsclient/nfs_clvnops.c (working copy) >>>> @@ -1144,7 +1144,7 @@ >>>> *vpp =3D NULLVP; >>>> } >>>> >>>> - if (error !=3D ENOENT) { >>>> + if (error !=3D ENOENT && error !=3D ESTALE) { >>>> if (NFS_ISV4(dvp)) >>>> error =3D nfscl_maperr(td, error, (uid= _t)0, >>>> (gid_t)0); >>> >>> fixes the case I described above (for some definition of "fixes" -- I'm= not >>> sure if this is the correct way of handling ESTALE here?) but I'm still= seeing >>> ESTALEs from buildworld's cleandir so I think there must be some other = place >>> where something odd is happening. >> >>Further information: In addition to the "lookup relative to a directory w= hich >>has been deleted out from underneath us" case which causes ESTALE to land= in >>nfs_lookup, the cleandir step of buildworld results in ESTALE being retur= ned >>by nfsrpc_getattr into nfs_getattr (landing ultimately in getcwd), and ES= TALE >>being returned by nfsrpc_accessrpc into nfs34_access_otw (landing ultimat= ely >>in stat and lstat). >> >>In UFS there are checks for effnlink =3D=3D 0 which result in e.g. ufs_lo= okup >>returning ENOENT; would it make sense to add NREMOVED to struct nfsnode.n= _flag >>and check this in the appropriate nfs_* calls? > To be honest, I can't think of a reason why userland would ever want to s= ee ESTALE? > The function you see above "nfscl_maperr()" could easily map all ESTALEs = to ENOENTs? > - The question is: "Would returning ENOENT for stat(2) and access(2) actu= ally make the > buildworld happy? > if Yes > - then mapping ESTALE->ENOENT makes sense for most/all VOP_xxx(= ) calls. > else > - I don't see any point in returning a different error and the= re might be some > code out there somewhere that depends on seeing ESTALE (I do= ubt it, but???). > > The real problem here is that the directory has a reference count on it w= hen it is > rmdir'd. POSIX file systems keep the data until the reference count goes = to 0, but > NFS isn't POSIX. > --> The cheat for regular files is "sillyrename". This could be done for = directories, > but there are multiple comments in the code (not put there by me) t= hat say > "no sillyrename for directories". > #1 Does this imply something breaks when you do sillyrename for dirs= ? > OR > #2 Does it mean no one has bothered to implement it? > Since implementing it would have been pretty easy, I have to suspect #1, = which > means I would be reluctant to do it, at least by default. > --> Maybe I'll send you a patch that does sillyrename for dirs which you = can test. > If it fixes buildworld, then it could be considered for head as a n= on-default option? > >Sillyrename makes sense for files because you can continue to read and >write to a file after it is unlinked, until it is closed and it totally >vanishes. It doesn't make sense for directories since you can't rmdir() >a directory until it is empty. Once it is rmdir()ed, then you can't >create any new entries in the directory because that would mean that you >are creating a disconnnected subtree in the filesystem. Good points. I just tried this ("rm -r" while another process's home dir is= in the subtree) and the process can still "ls" (gets no entries), but can't "cd .." because= that doesn't exist and no process can "cd" into the subtree. --> I can't think of an easy way to make this happen for the NFS client. My hunch is that the failure in "buildworld" will just have to be a "sorry,= but an NFS client isn't POSIX compliant, so this doesn't work" case. Fyi, the only way to safely delete all entries in an NFS mounted dir is: opendir() while (readdir() !=3D EOF) { unlink() the first dir entry acquired by the readdir() closedir() opendir() } So that it is always doing a readdir()/unlink() of the first entry in the d= irectory until the directory is empty. This has always been the case, so I am surprised that "rm -r subdir" usuall= y works.;-) - This is because there is no guarantee that directory offset cookies won't= change when an entry in the directory is removed. - It happens that UFS like file systems did have this property for a lon= g time. (This changed for FreeBSD when r252438 was committed to head, which sk= ipped over the d_ino =3D=3D 0 entries when copying out directories. Howeve= r, UFS still retains the property that the directory offset is monotonically incr= easing and the server can skip ahead to the next entry safely.) - I have no idea whether or not ZFS directory offsets change when an entry = is removed, but I do know that ZFS directory offsets are not monotonically increasing= values. The handling of directories has always been a weak spot for NFS, rick