Date: Fri, 21 Dec 2007 12:16:25 -0800 From: Alfred Perlstein <alfred@freebsd.org> To: Alexandre Biancalana <biancalana@gmail.com> Cc: freebsd-performance@freebsd.org Subject: Re: Bad performance when accessing a lot of small files Message-ID: <20071221201625.GZ16982@elvis.mu.org> In-Reply-To: <8e10486b0712191109n3d21b02cyf5183ee0cd01d8ce@mail.gmail.com> References: <8e10486b0712191109n3d21b02cyf5183ee0cd01d8ce@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
* Alexandre Biancalana <biancalana@gmail.com> [071219 11:35] wrote: > Hi List, > > I have a backup server running FreeBSD 7-BETA3. The cpu is CPU: > Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz, 3GB Ram, 10x 500GB > SATA, Areca 1231-ML, the filesystem used to backup my other servers > locally is build on top of ARC-1231, 4TB (32k stripe) zfs filesystem > with gzip compression. > > This machine receive backups from ~30 servers, (of all kinds and > sizes, databases, fileservers, image servers, webservers, etc) all > night, write the last day in LTO-3 tapes and store some days older > days in disk. > > The behavior that I'm observing and that want your help is when the > system is accessing some directory with many small files ( directories > with ~ 1 million of ~30kb files), the performance is very poor. There is a lot of very good tuning advice in this thread, however one thing to note is that having ~1 million files in a directory is not a very good thing to do on just about any filesystem. One trick that a lot of people do is hashing the directories themselves so that you use some kind of computation to break this huge dir into multiple smaller dirs. If you can figure out a hashing algorithm, that may help you. For instance, if you tell sendmail to use "/var/spool/mq*" for its mail spool and you happen to have 256 directories under "/var/spool/" named "mq000" through "mq256" it will randomly pick a directory to dump a file in. This makes the performance a lot better. For one million files you can probably do a two level hash, you just have to figure out a good hashing algorithm. If you you can describe the data, I may be able to help you come up with a hashing algorithm for it. -Alfred
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071221201625.GZ16982>