Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Oct 1996 13:10:19 -0500 (CDT)
From:      Karl Denninger  <karl@Mcs.Net>
To:        michaelh@cet.co.jp (Michael Hancock)
Cc:        dfr@render.com, freebsd-hackers@FreeBSD.org
Subject:   Re: Recent hacking (was Re: FreeBSD 2.2.x release question)
Message-ID:  <199610171810.NAA00720@Jupiter.Mcs.Net>
In-Reply-To: <Pine.SV4.3.93.961018013813.25359A-100000@parkplace.cet.co.jp> from "Michael Hancock" at Oct 18, 96 01:45:07 am

next in thread | previous in thread | raw e-mail | index | archive | help
> On Thu, 17 Oct 1996, Doug Rabson wrote:
> 
> > It would be extremely helpful if those of you who have NFS problems to
> > actually get your hands into the code and figure out what is happening in
> > your environment.  I know this is not always possible but certainly some
> > of you can.  A good example here is Hidetoshi Shimokawa who had a problem
> > with write performance, got into the code and found a solution.  This has
> > two benefits:  better performance for NFS and one more person who
> > understands (some of) how this monster works.
> 
> The debugging process of that thread and the SYN stuff was pretty
> enjoyable.  I was deleting through most of the other stuff and there was a
> lot of stuff to delete recently.
> 
> Unfortunately, I deleted the early background of the "Disappearing
> Directory Problem".  Which I didn't notice was interesting until the tail
> end of the thread.
> 
> Karl, would you mind recapping where you are with the "getcwd" thing?
> 
> Regards,
> 
> 
> Mike Hancock

Sure.

Background:	Server is a BSDI 2.0 machine
		Client is a FreeBSD 2.2-CURRENT system

Symptom:	Randomly, "getcwd()" fails.

Analysis thus far:
		The search up-directory in "getcwd()" walks up the directory
		tree from "." to the root (determined by the device numbers
		and inodes for root when it gets there) looking at each path
		component and inserting it in the returned string.

		Thus, what happens in getcwd() is this:

			Save our inode number
			Open ".."
			Read through it, looking for the inode number.
			Save the path component

			Iterate until you reach "/"

			Return the path to the user.

		Now, some problems we've found.

		1) When it fails, the up-movement works but the inode
		number is NOT seen in the FIRST component when the directory
		read is performed.  (ie: the directory is
		"/user/contrib/swilson", the first component is 'swilson'
		and that is not found at the first "step-up".

		2) A bug in libc() was found where the path was not being
		null-terminated, which led to comparisons looking for really 
		bizarre names (ie: 1024-byte random strings).   We thought
		this might be the cause of the problem, but it wasn't.  I
		have send in a commit (already accepted) to fix this.

		3) Two consecutive "mv"s (rename to a different name, then
		back) clear the problem on that given directory - but it
		does eventually come back.

		4) The problem is *random* and comes and goes for a given
		directory.  That it exists one minute does not imply that it
		will 2 minutes later.

		5) Some people have reported that if they actually do a "ls"
		of the directory up one level, the affected paths are now
		showing up.  I've not been able to nail this, but its
		consistent with the failure noted in (1) above.

Current speculation is that this is a vnode cache handling problem of some 
kind, where the vnode for the desired directory is being "flushed" but 
never reloaded into the cache.  We're still investigating and searching for
the root cause.

Note that this appears to happen on directories with LARGE numbers of 
subdirectory entries -- and not on ones with small numbers of directories.
I've never seen it occur, for example, on MY home directory -- but I'm on a
disk pack with maybe 20 directories at the same level that I'm on.  

The places where it happens frequently have perhaps 3,000 - 4,000 directories
at the same level, which is common on our big user disks.

That's what we know right now.

BTW, the heuristic in getcwd() needs some work, but I'm not sure how to
accomplish it as of yet.  The reason is that we'd REALLY like to be able to 
protect the directories involved from a listing -- that is, make them mode 
711.  However, doing this causes logins to fail and all the shells to bitch
loudly.

An example:

/			- 755
/user			- 755
/user/contrib		- 711
/user/contrib/who-am-i	- 700

The user is "who-am-i", and in that directory.

getcwd() will return an error in this environment, as when it tries to READ
/user/contrib to find the inode match for the "who-am-i" component it is
unable to open that directory for this purpose.

I'm doing a brain-search on ways to make it possible to protect things in
this fashion and still have the getcwd() call succeed, but I don't know if
its even possible.

--
--
Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity
http://www.mcs.net/~karl     | T1 from $600 monthly; speeds to DS-3 available
			     | 23 Chicagoland Prefixes, 13 ISDN, much more
Voice: [+1 312 803-MCS1 x219]| Email to "info@mcs.net" WWW: http://www.mcs.net/
Fax:   [+1 312 248-9865]     | Home of Chicago's only FULL Clarinet feed!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199610171810.NAA00720>