From owner-freebsd-hackers Thu Oct 17 22:53:02 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id WAA27591 for hackers-outgoing; Thu, 17 Oct 1996 22:53:02 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id WAA27586 for ; Thu, 17 Oct 1996 22:52:59 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.0/CET-v2.1) with SMTP id FAA01334; Fri, 18 Oct 1996 05:52:48 GMT Date: Fri, 18 Oct 1996 14:52:47 +0900 (JST) From: Michael Hancock To: Karl Denninger cc: freebsd-hackers@freebsd.org Subject: NFS node: disappearing directory In-Reply-To: <199610171810.NAA00720@Jupiter.Mcs.Net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Thu, 17 Oct 1996, Karl Denninger wrote: > Background: Server is a BSDI 2.0 machine > Client is a FreeBSD 2.2-CURRENT system > > Symptom: Randomly, "getcwd()" fails. > > Analysis thus far: > The search up-directory in "getcwd()" walks up the directory > tree from "." to the root (determined by the device numbers > and inodes for root when it gets there) looking at each path > component and inserting it in the returned string. > > Thus, what happens in getcwd() is this: > > Save our inode number > Open ".." > Read through it, looking for the inode number. > Save the path component > > Iterate until you reach "/" > > Return the path to the user. > > Now, some problems we've found. > > 1) When it fails, the up-movement works but the inode > number is NOT seen in the FIRST component when the directory > read is performed. (ie: the directory is > "/user/contrib/swilson", the first component is 'swilson' > and that is not found at the first "step-up". Is this remote /user mounted on local /user? This might not be relevant, but it helps in understanding the execution path. > 2) A bug in libc() was found where the path was not being > null-terminated, which led to comparisons looking for really > bizarre names (ie: 1024-byte random strings). We thought > this might be the cause of the problem, but it wasn't. I > have send in a commit (already accepted) to fix this. > > 3) Two consecutive "mv"s (rename to a different name, then > back) clear the problem on that given directory - but it > does eventually come back. > > 4) The problem is *random* and comes and goes for a given > directory. That it exists one minute does not imply that it > will 2 minutes later. > > 5) Some people have reported that if they actually do a "ls" > of the directory up one level, the affected paths are now > showing up. I've not been able to nail this, but its > consistent with the failure noted in (1) above. > > Current speculation is that this is a vnode cache handling problem of some > kind, where the vnode for the desired directory is being "flushed" but > never reloaded into the cache. We're still investigating and searching for > the root cause. But 3) says it does get reloaded. > Note that this appears to happen on directories with LARGE numbers of > subdirectory entries -- and not on ones with small numbers of directories. > I've never seen it occur, for example, on MY home directory -- but I'm on a > disk pack with maybe 20 directories at the same level that I'm on. > > The places where it happens frequently have perhaps 3,000 - 4,000 directories > at the same level, which is common on our big user disks. > > That's what we know right now. > > BTW, the heuristic in getcwd() needs some work, but I'm not sure how to > accomplish it as of yet. The reason is that we'd REALLY like to be able to > protect the directories involved from a listing -- that is, make them mode > 711. However, doing this causes logins to fail and all the shells to bitch > loudly. > > An example: > > / - 755 > /user - 755 > /user/contrib - 711 > /user/contrib/who-am-i - 700 > > The user is "who-am-i", and in that directory. > > getcwd() will return an error in this environment, as when it tries to READ > /user/contrib to find the inode match for the "who-am-i" component it is > unable to open that directory for this purpose. > > I'm doing a brain-search on ways to make it possible to protect things in > this fashion and still have the getcwd() call succeed, but I don't know if > its even possible. The above permissions work under SysV.