From owner-freebsd-questions Tue Aug 18 14:33:58 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id OAA00716 for freebsd-questions-outgoing; Tue, 18 Aug 1998 14:33:58 -0700 (PDT) (envelope-from owner-freebsd-questions@FreeBSD.ORG) Received: from picasso.tellique.de (picasso.tellique.de [62.144.106.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA00653 for ; Tue, 18 Aug 1998 14:33:49 -0700 (PDT) (envelope-from ni@tellique.de) Received: from tellique.de (nolde.tellique.de [62.144.106.52]) by picasso.tellique.de (8.8.8/8.8.8) with ESMTP id XAA02181; Tue, 18 Aug 1998 23:32:22 +0200 (MET DST) Message-ID: <35D9F2D8.5E638264@tellique.de> Date: Tue, 18 Aug 1998 23:32:08 +0200 From: Juergen Nickelsen Organization: Tellique Kommunikationstechnik GmbH X-Mailer: Mozilla 4.05 [en] (WinNT; I) MIME-Version: 1.0 To: questions@FreeBSD.ORG Subject: Re: Free BSD file system References: <35D9C4E6.2897@echidna.com> <19980818141707.54820@homenet> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Aaron Jeremias Luz wrote: > > I have a situation that involves storing the better part of a > > million small (700 bytes to 1.9 kbytes) files (don't ask!). From a > > filesystem efficiency point of view, what is a practical maximum > > number of files per directory? How many directories can you have > > under one directory? > > The filesystem should remain efficient. However, applications which > read the directory may be overwhelmed. For example, ls sorts the > names of the files in a directory before outputing them. Running ls > on a directory with many thousands of files in it could take a > while. While you can have as many files and directories in a directory as you want (provided you have enough space and inodes(*) on the partition), having too many entries in a directory slows down all directory operations. This does not only affect programs that sort the entries, but also the kernel. Every time a file in the directory is accessed, the file system has to make a linear search through the directory to find the file's entry, and every time a file is created, the directory is searched for a free entry (or is extended after none has been found). While the directory itself will be held in the buffer cache to avoid frequent disk access, the search through the directory structure itself is expensive if you have a million entries. Look at what apache does in the proxy cache: in the cache directory it creates a lot of subdirectories, each with a name one character long, and in each of these again subdirectories of the same kind. This hierarchy is by default three levels deep. For file and directory names apache uses the 64 characters [a-ZA-Z0-9@_]. If you have three levels of directories, you have 64**3 = 2**18 directories at level 3, and if you put up to 64 files into these directories, you have 2**24 == 16 Million files. To access one of these files, you need to search four directories each 64 entries long (in the worst case), which is a search over 256 entries -- significantly less than a search through a directory of a million entries. You have to open three more directories for the search, but that will easily pay off. (*) The number of inodes (which hold the information about a file; a directory entry is a reference to the inode) is definitely an issue if we talk about a million files. Today I made a file system of 8.5 GB on a new disk (under Solaris, though), and newfs created 1048060 inodes -- barely enough for your case. You can change that number with an option to newfs (look for "number of bytes per inode" in newfs(8)) to have enough inodes for you files and directories. Greetings, Juergen. -- Juergen Nickelsen Tellique Kommunikationstechnik GmbH Gustav-Meyer-Allee 25, 13355 Berlin, Germany Tel. +49 30 46307-552 / Fax +49 30 46307-579 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message