From owner-freebsd-fs Mon Jul 31 17:34:46 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 2A21537B78F for ; Mon, 31 Jul 2000 17:34:38 -0700 (PDT) (envelope-from tlambert@usr07.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id RAA01759; Mon, 31 Jul 2000 17:33:10 -0700 (MST) Received: from usr07.primenet.com(206.165.6.207) via SMTP by smtp04.primenet.com, id smtpdAAAwvaWPc; Mon Jul 31 17:32:25 2000 Received: (from tlambert@localhost) by usr07.primenet.com (8.8.5/8.8.5) id RAA15725; Mon, 31 Jul 2000 17:33:38 -0700 (MST) From: Terry Lambert Message-Id: <200008010033.RAA15725@usr07.primenet.com> Subject: Re: FFS performance for large directories? To: zzhang@cs.binghamton.edu (Zhihui Zhang) Date: Tue, 1 Aug 2000 00:32:16 +0000 (GMT) Cc: stevec@nbci.com (Steve Carlson), freebsd-fs@FreeBSD.ORG In-Reply-To: from "Zhihui Zhang" at Jul 31, 2000 03:44:57 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > A third thing is that FFS performs poor accessing /usr/ports. This has > something to do with how FFS layout directory inode (not file inode). The > book 4.4 BSD design and implementation explains this well. If fact, read > that book carefully, you can have better idea than you can get from a > mailing list. Good luck! This is because the tarball is packed up in the wrong order; change the packing order (breadth-first vs. depth-first), and the "ports problem" goes away. I have done this with the -T option to tar, and it works fine, so long as you have an accurate file. This ensures that there is no cache-busting on the dearchive, which is the source of the problem. In general, your performance will vary most based on your file name length. A given directory entry block is 512b, and a given directory entry will take 8 bytes, plus however many bytes in a name, rounded to a 4 byte boundary. So for an average 8 character file name length, this works out to 32 entries per block, at 16 bytes per entry. When searching directories, the getdirentries(2) system call is used. This is normally called with a buffer capable of handling a full directory entry block worth of entries in a single system call, so with a small average file length (e.g. as with a mail queue directory), this is fairly inexpensive. NB: Mail queue directories pay heavy penalties for the MTA scanning the queue contents, much more so than the overhead of traversing the directory, so the considerations of what you _do_ with the information once you know what files are in the directory is of great importance. Normally, this is not an issue for most applications, and only specific applications have issues with lots of entries (e.g. those like MTAs that must _do_ something with the data, and those which are creating new files, since you must traverse the entire directory to create a new file, whereas doing a lookup will on average only traverse 50% of the directory -- hence negative caches are generally twice as valuable as positive caches). If you would provide more information about why you are asking the question (i.e. what is your intended application?), we could probably give you advice on how to implement it such a way as to ensure the directory depth issue does not impact you. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message