From owner-freebsd-hackers  Thu Jun 22  9:28:27 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from mail.bastard.co.uk (node16292.a2000.nl [24.132.98.146])
	by hub.freebsd.org (Postfix) with ESMTP id A1F9937BE71
	for <hackers@freebsd.org>; Thu, 22 Jun 2000 09:28:21 -0700 (PDT)
	(envelope-from adrian@bastard.co.uk)
Received: from adrian by mail.bastard.co.uk with local (Exim 3.14 #1)
	id 1359oV-0008oH-00; Thu, 22 Jun 2000 18:26:47 +0200
Date: Thu, 22 Jun 2000 18:26:47 +0200
From: Adrian Chadd <adrian@freebsd.org>
To: Don Lewis <Don.Lewis@tsc.tdk.com>
Cc: Daniel O'Connor <doconnor@gsoft.com.au>,
	Luigi Rizzo <luigi@info.iet.unipi.it>, hackers@FreeBSD.ORG,
	"Nicole Harrington." <nicole@unixgirl.com>
Subject: Re: How many files can I put in one diretory?
Message-ID: <20000622182647.Q29036@zoe.bastard.co.uk>
References: <XFMail.000622171146.doconnor@gsoft.com.au> <200006220750.AAA07430@salsa.gv.tsc.tdk.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <200006220750.AAA07430@salsa.gv.tsc.tdk.com>; from Don.Lewis@tsc.tdk.com on Thu, Jun 22, 2000 at 12:50:09AM -0700
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, Jun 22, 2000, Don Lewis wrote:
> On Jun 22,  5:11pm, "Daniel O'Connor" wrote:
> } Subject: Re: How many files can I put in one diretory?
> } 
> } On 22-Jun-00 Luigi Rizzo wrote:
> } >  that sounds insane! Because a name is a name, why dont they call
> } >  those files xx/yy/zz/tt.html and the like, to get down to a more
> } >  reasonable # of files per directory.
> } >  
> } >  Or use a single file and a cgi which extracts things from the right place.
> } >  In such a context, i assume that the best place to do the name lookup
> } >  is in the app, not in the kernel.
> } 
> } Yeah.. This is why databases where invented :)
> } 
> } FYI 40000 in a directory really makes directory listings slow.. 2 million would
> } suck :)
> 
> Only if directory lookups use a sequential search.  Not all filesystem
> implementations sequentially scan directory entries.  Some use btrees or
> other ways of quickly finding the desired directory entry.  Even so,
> you probably still would want to avoid doing an "ls" or an "echo *" ;-)
> 
> I'd recommend looking at how squid stores its disk cache.  It has a
> very similar performance problem to solve.

Squid uses a 2-level directory hierarchy with a simple mapping
directory<->filename. Since each disk object in squid has a swap file
number, translating between directory name and swap file number is
only two MOD (%) operations away.

The main trouble with squid's UFS layout isn't in the ls time, but
in the access time. When looking at a file, you have to do a set
of file path component lookups (/cache1/00/01/000102 would need
a lookup for /cache1, 00/, 01/, and then the file 000102) which takes
time. You have to do a linear search inside the directory in order
to find the file you're after. And if you have 2 million files (thats
a standard squid box these days), even with a small (10%) frequently
used subset of these files, thats 200,000 files thrashing your namecache.

Its not pretty, which is why I'm working on alternatives. :-)

One of the alternatives which Robert has mentioned is IFS. It is designed
for applications like the above. Instead of requiring a filename, you
simply index each file in FFS using the inode number. It is not committed
right now, but I'm hoping it will provide a solution to problems like
this.

You can find the IFS code http://www.freebsd.org/~adrian/ .


Adrian

-- 
Adrian Chadd			Build a man a fire, and he's warm for the
<adrian@FreeBSD.org>		rest of the evening. Set a man on fire and
				he's warm for the rest of his life.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message