Date: Thu, 22 Apr 2004 16:31:39 -0400 (EDT) From: Robert Watson <rwatson@freebsd.org> To: Eric Anderson <anderson@centtech.com> Cc: freebsd-current@freebsd.org Subject: Re: Directories with 2million files Message-ID: <Pine.NEB.3.96L.1040422135116.22456B-100000@fledge.watson.org> In-Reply-To: <40867A5D.9010600@centtech.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 21 Apr 2004, Eric Anderson wrote: > First, let me say that I am impressed (but not shocked) - FreeBSD > quietly handled my building of a directory with 2055476 files in it. > I'm not sure if there is a limit to this number, but at least we know it > works to 2million. I'm running 5.2.1-RELEASE. > > However, several tools seem to choke on that many files - mainly ls and > du. Find works just fine. Here's what my directory looks like (from > the parent): Directories with millions of entries turn up surprisingly frequently, actually. While FreeBSD handles them quite well, they're something that's not frequently optimized for in applications: cyrus# /usr/bin/time \ls -f | wc 1.86 real 1.20 user 0.34 sys 338806 338806 2599362 cyrus# /usr/bin/time \ls | wc 6.48 real 4.39 user 0.28 sys 338807 338807 2599370 > I'd work on some patches, but I'm not worth much when it comes to C/C++. Unfortunately, a lot of this has to do with the desire to have programs behave nicely in ways that scale well only to a limited extent. I.e., sorting and sizing of output. If you have algorithms that require all elements in a large array be in memory, such as sorting algorithms, it's inevitably going to hurt. And with text applications designed to run in command pipelines, to POSIX specs, etc, there isn't a whole lot of room to generate warnings like: cyrus# ls ls: Holy cow, you have a lot of files. You might want to disable sorting. ... > If someone has some patches, or code to try, let me know - I'd be more > than willing to test, possibly even give out an account on the machine. Efficiency improvements will generally always be welcome, as long as they're correct and don't overly complicate the implementation. For what it's worth, I've noticed a lot of tools are getting better about handling large numbers of (whatevers). For example, when I pointed Mozilla at an IMAP mail folder with 100,000 messages in it, it would reread the mailbox index every 60 seconds if there was a mailbox change. If you add one message to the mailbox a minute, it will never stop rereading the index if it takes over 59 seconds to read the index, which over a WAN it would. Recent versions are *much* smarter, and appear in many cases to scale to millions of messages, which is what I keep in my large directories :-). Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1040422135116.22456B-100000>