From owner-freebsd-hackers  Sat May 26 13: 7:12 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from dzerzhinsky.rem.cs.cmu.edu (DZERZHINSKY.REM.CS.CMU.EDU [128.2.80.192])
	by hub.freebsd.org (Postfix) with ESMTP id EBC4837B422
	for <hackers@FreeBSD.ORG>; Sat, 26 May 2001 13:07:08 -0700 (PDT)
	(envelope-from nlanza@dzerzhinsky.rem.cs.cmu.edu)
Received: (from nlanza@localhost)
	by dzerzhinsky.rem.cs.cmu.edu (8.11.3/8.11.3) id f4QK70m61414;
	Sat, 26 May 2001 16:07:00 -0400 (EDT)
	(envelope-from nlanza)
To: Andrew Reilly <areilly@bigpond.net.au>
Cc: gjb@gbch.net, jandrese@mitre.org, float@firedrake.org,
	hackers@FreeBSD.ORG
Subject: Re: technical comparison
References: <20010525044848.08CAC37B422@hub.freebsd.org>
From: Nat Lanza <magus@cs.cmu.edu>
Date: 26 May 2001 16:06:59 -0400
In-Reply-To: <20010525044848.08CAC37B422@hub.freebsd.org>
Message-ID: <khitinx4bg.fsf@dzerzhinsky.rem.cs.cmu.edu>
Lines: 36
User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.1 (Cuyahoga Valley)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Andrew Reilly <areilly@bigpond.net.au> writes:

> Where in open(1) does it specify a limit on the number of files
> permissible in a directory?  The closest that it comes, that I can
> see is:

Well, read(2) doesn't tell you not to do your IO one character at a
time, but that doesn't mean it's a good idea. The point here is not
interface definitions, it's efficiency. Nobody's saying you shouldn't
be _allowed_ to put thousands and thousands of files in a directory if
you like. They're just saying that you shouldn't expect it to be fast.
Similarly, you can read data one byte at a time if you like, but you
shouldn't expect that to be fast either.

Pointing to manpages and saying you weren't warned that a particular
approach is slow is a really weak defense. Do you expect cliffs to
have little "If you drive off this cliff, you will die" warning signs
on them?

If a documented part of the API simply did not work, then you'd have a
point. Instead, what we have is a case where a method of storing files
that most people reasonably expect to be slow is in fact slow.

The folks who've pointed out the /a/a/aardvark solution are right --
directory hashing is a well-known solution to this problem. It isn't
a hack at all. No matter what method you use for storing directories,
larger directories are going to be slower to use than smaller ones,
and hashing filenames fixes that.


--nat

-- 
nat lanza ----------------------------------- there are no whole truths;
magus@cs.cmu.edu ---------------------------- all truths are half-truths
http://www.cs.cmu.edu/~magus/ ---------------  -- alfred north whitehead

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message