Date: Tue, 29 May 2012 01:11:29 -0700 From: Doug Barton <dougb@FreeBSD.org> To: Bruce Evans <brde@optusnet.com.au> Cc: freebsd-fs@FreeBSD.org Subject: Re: Millions of small files: best filesystem / best options Message-ID: <4FC484B1.2080202@FreeBSD.org> In-Reply-To: <20120529161802.N975@besplex.bde.org> References: <1490568508.7110.1338224468089.JavaMail.root@zimbra.interconnessioni.it> <4FC457F7.9000800@FreeBSD.org> <20120529161802.N975@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 5/29/2012 12:35 AM, Bruce Evans wrote: > On Mon, 28 May 2012, Doug Barton wrote: > >> On 5/28/2012 10:01 AM, Alessio Focardi wrote: >>> So in my case I would have to use -b 4096 -f 512 >>> >>> It's an improvement, but still is not ideal: still a big waste with >>> 200 bytes files! >> >> Are all of the files exactly 200 bytes? If so that's likely the best you >> can do. > > It is easy to do better by using a file system that supports small block > sizes. This might be slow, but it reduces the wastage. Possible file > systems: > - msdosfs has a minimum block size of 512 and handles caching for this > fairly well for a small number of files, but is probably even slower > than ffs for a large number of files. Especially when directories > are involved. > - ext2fs has a minimum block size of 1024 and handles caching for this > fairly poorly. I wouldn't choose either of those for a mission-critical system. I use both of them daily, and while they work, the performance and reliability are not something I'd bet a business on. >> The good news is that it's a big improvement (I've done similar >> stuff in the past). You'll also want to tweak the -i (inode) value to >> insure that you have sufficient inodes for the number of files you plan >> to store. The default is not likely to be adequate for your needs. > > Big is relative. 4K-blocks with 200-byte files gives a wastage factor > of 20. Metadata alone will be 256 bytes for the inode alone with ffs2. > Only 128 bytes with ffs1. Only 32 bytes with msdosfs. I'm talking about "big" in the sense of how much better it performed. Changing the file system defaults as I described made a 4-fold decrease in the load time for a busy BIND server that had to load hundreds of thousands of tiny zone files. I was surprised actually at how much better it worked, given that (as you correctly describe) the math is certainly not 4 times better. > But I expect using a file system would be so slow for lots of really > small files that I wouldn't try it. Caching is already poor for > 4K-files, and a factor of 20 loss won't improve it. If you don't want > to use a database, maybe you can use tar.[gz] files. These at least > reduce the wastage (but still waste about twice as much as msdosfs with > 512 byte blocks), unless they are compressed. I think there are ways > to treat tar files as file systems and to avoid reading the whole file > to find files in it (zip format is better for this). There are some uses cases where you have to have a flat file on disk. If I were the OP I'd be looking at any excuse I could find to get this stuff into a db too, but I think he said that was out of scope. Doug -- This .signature sanitized for your protection
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FC484B1.2080202>