Date: Thu, 17 Oct 1996 13:10:19 -0500 (CDT) From: Karl Denninger <karl@Mcs.Net> To: michaelh@cet.co.jp (Michael Hancock) Cc: dfr@render.com, freebsd-hackers@FreeBSD.org Subject: Re: Recent hacking (was Re: FreeBSD 2.2.x release question) Message-ID: <199610171810.NAA00720@Jupiter.Mcs.Net> In-Reply-To: <Pine.SV4.3.93.961018013813.25359A-100000@parkplace.cet.co.jp> from "Michael Hancock" at Oct 18, 96 01:45:07 am
next in thread | previous in thread | raw e-mail | index | archive | help
> On Thu, 17 Oct 1996, Doug Rabson wrote: > > > It would be extremely helpful if those of you who have NFS problems to > > actually get your hands into the code and figure out what is happening in > > your environment. I know this is not always possible but certainly some > > of you can. A good example here is Hidetoshi Shimokawa who had a problem > > with write performance, got into the code and found a solution. This has > > two benefits: better performance for NFS and one more person who > > understands (some of) how this monster works. > > The debugging process of that thread and the SYN stuff was pretty > enjoyable. I was deleting through most of the other stuff and there was a > lot of stuff to delete recently. > > Unfortunately, I deleted the early background of the "Disappearing > Directory Problem". Which I didn't notice was interesting until the tail > end of the thread. > > Karl, would you mind recapping where you are with the "getcwd" thing? > > Regards, > > > Mike Hancock Sure. Background: Server is a BSDI 2.0 machine Client is a FreeBSD 2.2-CURRENT system Symptom: Randomly, "getcwd()" fails. Analysis thus far: The search up-directory in "getcwd()" walks up the directory tree from "." to the root (determined by the device numbers and inodes for root when it gets there) looking at each path component and inserting it in the returned string. Thus, what happens in getcwd() is this: Save our inode number Open ".." Read through it, looking for the inode number. Save the path component Iterate until you reach "/" Return the path to the user. Now, some problems we've found. 1) When it fails, the up-movement works but the inode number is NOT seen in the FIRST component when the directory read is performed. (ie: the directory is "/user/contrib/swilson", the first component is 'swilson' and that is not found at the first "step-up". 2) A bug in libc() was found where the path was not being null-terminated, which led to comparisons looking for really bizarre names (ie: 1024-byte random strings). We thought this might be the cause of the problem, but it wasn't. I have send in a commit (already accepted) to fix this. 3) Two consecutive "mv"s (rename to a different name, then back) clear the problem on that given directory - but it does eventually come back. 4) The problem is *random* and comes and goes for a given directory. That it exists one minute does not imply that it will 2 minutes later. 5) Some people have reported that if they actually do a "ls" of the directory up one level, the affected paths are now showing up. I've not been able to nail this, but its consistent with the failure noted in (1) above. Current speculation is that this is a vnode cache handling problem of some kind, where the vnode for the desired directory is being "flushed" but never reloaded into the cache. We're still investigating and searching for the root cause. Note that this appears to happen on directories with LARGE numbers of subdirectory entries -- and not on ones with small numbers of directories. I've never seen it occur, for example, on MY home directory -- but I'm on a disk pack with maybe 20 directories at the same level that I'm on. The places where it happens frequently have perhaps 3,000 - 4,000 directories at the same level, which is common on our big user disks. That's what we know right now. BTW, the heuristic in getcwd() needs some work, but I'm not sure how to accomplish it as of yet. The reason is that we'd REALLY like to be able to protect the directories involved from a listing -- that is, make them mode 711. However, doing this causes logins to fail and all the shells to bitch loudly. An example: / - 755 /user - 755 /user/contrib - 711 /user/contrib/who-am-i - 700 The user is "who-am-i", and in that directory. getcwd() will return an error in this environment, as when it tries to READ /user/contrib to find the inode match for the "who-am-i" component it is unable to open that directory for this purpose. I'm doing a brain-search on ways to make it possible to protect things in this fashion and still have the getcwd() call succeed, but I don't know if its even possible. -- -- Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity http://www.mcs.net/~karl | T1 from $600 monthly; speeds to DS-3 available | 23 Chicagoland Prefixes, 13 ISDN, much more Voice: [+1 312 803-MCS1 x219]| Email to "info@mcs.net" WWW: http://www.mcs.net/ Fax: [+1 312 248-9865] | Home of Chicago's only FULL Clarinet feed!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199610171810.NAA00720>