From owner-freebsd-hackers Thu Jun 22 9:28:27 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from mail.bastard.co.uk (node16292.a2000.nl [24.132.98.146]) by hub.freebsd.org (Postfix) with ESMTP id A1F9937BE71 for ; Thu, 22 Jun 2000 09:28:21 -0700 (PDT) (envelope-from adrian@bastard.co.uk) Received: from adrian by mail.bastard.co.uk with local (Exim 3.14 #1) id 1359oV-0008oH-00; Thu, 22 Jun 2000 18:26:47 +0200 Date: Thu, 22 Jun 2000 18:26:47 +0200 From: Adrian Chadd To: Don Lewis Cc: Daniel O'Connor , Luigi Rizzo , hackers@FreeBSD.ORG, "Nicole Harrington." Subject: Re: How many files can I put in one diretory? Message-ID: <20000622182647.Q29036@zoe.bastard.co.uk> References: <200006220750.AAA07430@salsa.gv.tsc.tdk.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <200006220750.AAA07430@salsa.gv.tsc.tdk.com>; from Don.Lewis@tsc.tdk.com on Thu, Jun 22, 2000 at 12:50:09AM -0700 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, Jun 22, 2000, Don Lewis wrote: > On Jun 22, 5:11pm, "Daniel O'Connor" wrote: > } Subject: Re: How many files can I put in one diretory? > } > } On 22-Jun-00 Luigi Rizzo wrote: > } > that sounds insane! Because a name is a name, why dont they call > } > those files xx/yy/zz/tt.html and the like, to get down to a more > } > reasonable # of files per directory. > } > > } > Or use a single file and a cgi which extracts things from the right place. > } > In such a context, i assume that the best place to do the name lookup > } > is in the app, not in the kernel. > } > } Yeah.. This is why databases where invented :) > } > } FYI 40000 in a directory really makes directory listings slow.. 2 million would > } suck :) > > Only if directory lookups use a sequential search. Not all filesystem > implementations sequentially scan directory entries. Some use btrees or > other ways of quickly finding the desired directory entry. Even so, > you probably still would want to avoid doing an "ls" or an "echo *" ;-) > > I'd recommend looking at how squid stores its disk cache. It has a > very similar performance problem to solve. Squid uses a 2-level directory hierarchy with a simple mapping directory<->filename. Since each disk object in squid has a swap file number, translating between directory name and swap file number is only two MOD (%) operations away. The main trouble with squid's UFS layout isn't in the ls time, but in the access time. When looking at a file, you have to do a set of file path component lookups (/cache1/00/01/000102 would need a lookup for /cache1, 00/, 01/, and then the file 000102) which takes time. You have to do a linear search inside the directory in order to find the file you're after. And if you have 2 million files (thats a standard squid box these days), even with a small (10%) frequently used subset of these files, thats 200,000 files thrashing your namecache. Its not pretty, which is why I'm working on alternatives. :-) One of the alternatives which Robert has mentioned is IFS. It is designed for applications like the above. Instead of requiring a filename, you simply index each file in FFS using the inode number. It is not committed right now, but I'm hoping it will provide a solution to problems like this. You can find the IFS code http://www.freebsd.org/~adrian/ . Adrian -- Adrian Chadd Build a man a fire, and he's warm for the rest of the evening. Set a man on fire and he's warm for the rest of his life. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message