Date: Tue, 7 Jun 2005 17:57:02 +0100 (BST) From: Robert Watson <rwatson@FreeBSD.org> To: Eric Anderson <anderson@centtech.com> Cc: Randy Bush <randy@psg.com>, freebsd-fs@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, Julian Elischer <julian@elischer.org> Subject: Re: you are in an fs with millions of small files Message-ID: <20050607175242.D61131@fledge.watson.org> In-Reply-To: <42A59367.6060307@centtech.com> References: <17059.7150.269428.448187@roam.psg.com> <42A4D5D0.9040500@elischer.org> <42A59367.6060307@centtech.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 7 Jun 2005, Eric Anderson wrote: > Julian Elischer wrote: >> what. all in one directory? >> >> I've only had up to 500,000 files in one directory on FreeBSD. > > The only problems I've had with a directory with millions of files is > things like ls -al with attempt to sort the list, but the list doesn't > fit into memory. Access to the files is of course very snappy. Ditto. I regularly use directories with tens and hundreds of thousands of entries as a result of manipulating very large folders with the Cyrus server. I run into the following two classes of problems: - Some appliations behave poorly with large trees. ls(1) is the classic example -- sorting 150,000 strings is expensive, and should be avoided. It also requires holding al the strings in memory rather than continuing the iteration. fts ns bad about this, so many applications that use fts suffer from this. With the sort issue, -f makes a big difference. - Some operations become more expensive -- as directories grow, the cost of adding new entries gets more expensive. You'll notice this fairly substantailly if you untar a tar file with many entries in the same directory -- early on, cost of insert for a new item is very cheap, but it rapidly slows down from h thousands of inserts per second to hundreds or less. I notice this if I restore a large Cyrus directory from backup. - UFS_DIRHASH really helps with large directory performance by reducing the cost of lookup, but at the cost of memory. Make sure the box has lots of memory. All this said -- FreeBSD works really well for me with large file counts, I rarely hit the edge cases where there is a problem. Most problems are with applications, and when you are using more extreme file system layouts, you typically are using applications customized for that andso they do the right things. Robert N M Watson
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050607175242.D61131>