Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Jun 2005 17:57:02 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Eric Anderson <anderson@centtech.com>
Cc:        Randy Bush <randy@psg.com>, freebsd-fs@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, Julian Elischer <julian@elischer.org>
Subject:   Re: you are in an fs with millions of small files
Message-ID:  <20050607175242.D61131@fledge.watson.org>
In-Reply-To: <42A59367.6060307@centtech.com>
References:  <17059.7150.269428.448187@roam.psg.com> <42A4D5D0.9040500@elischer.org> <42A59367.6060307@centtech.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 7 Jun 2005, Eric Anderson wrote:

> Julian Elischer wrote:
>> what. all in one directory?
>> 
>> I've only had up to 500,000 files in one directory on FreeBSD.
>
> The only problems I've had with a directory with millions of files is 
> things like ls -al with attempt to sort the list, but the list doesn't 
> fit into memory.  Access to the files is of course very snappy.

Ditto.  I regularly use directories with tens and hundreds of thousands of 
entries as a result of manipulating very large folders with the Cyrus 
server.  I run into the following two classes of problems:

- Some appliations behave poorly with large trees.  ls(1) is the classic
   example -- sorting 150,000 strings is expensive, and should be avoided.
   It also requires holding al the strings in memory rather than continuing
   the iteration.  fts ns bad about this, so many applications that use fts
   suffer from this.  With the sort issue, -f makes a big difference.

- Some operations become more expensive -- as directories grow, the cost
   of adding new entries gets more expensive.  You'll notice this fairly
   substantailly if you untar a tar file with many entries in the same
   directory -- early on, cost of insert for a new item is very cheap, but
   it rapidly slows down from h thousands of inserts per second to hundreds
   or less.  I notice this if I restore a large Cyrus directory from
   backup.

- UFS_DIRHASH really helps with large directory performance by reducing
   the cost of lookup, but at the cost of memory.  Make sure the box has
   lots of memory.

All this said -- FreeBSD works really well for me with large file counts, 
I rarely hit the edge cases where there is a problem.  Most problems are 
with applications, and when you are using more extreme file system 
layouts, you typically are using applications customized for that andso 
they do the right things.

Robert N M Watson



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050607175242.D61131>