Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 May 2012 17:35:18 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Doug Barton <dougb@FreeBSD.org>
Cc:        freebsd-fs@FreeBSD.org
Subject:   Re: Millions of small files: best filesystem / best options
Message-ID:  <20120529161802.N975@besplex.bde.org>
In-Reply-To: <4FC457F7.9000800@FreeBSD.org>
References:  <1490568508.7110.1338224468089.JavaMail.root@zimbra.interconnessioni.it> <4FC457F7.9000800@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 28 May 2012, Doug Barton wrote:

> On 5/28/2012 10:01 AM, Alessio Focardi wrote:
>> So in my case I would have to use -b 4096 -f 512
>>
>> It's an improvement, but still is not ideal: still a big waste with 200 bytes files!
>
> Are all of the files exactly 200 bytes? If so that's likely the best you
> can do.

It is easy to do better by using a file system that supports small block
sizes.  This might be slow, but it reduces the wastage.  Possible file
systems:
- msdosfs has a minimum block size of 512 and handles caching for this
   fairly well for a small number of files, but is probably even slower
   than ffs for a large number of files.  Especially when directories
   are involved.
- ext2fs has a minimum block size of 1024 and handles caching for this
   fairly poorly.
- it is easy to fix ffs to support a minimum block size of 512 (by
   reducing its gratuitous limit of MINBSIZE and fixing the few things
   that break:

% magic	19540119 (UFS2)	time	Tue May 29 16:16:20 2012
% superblock location	65536	id	[ 4fc46886 2007c27b ]
% ncg	4	size	1200	blocks	947
% bsize	512	shift	9	mask	0xfffffe00
% fsize	512	shift	9	mask	0xfffffe00
% frag	1	shift	0	fsbtodb	0
% minfree	8%	optim	time	symlinklen 120
% maxbsize 512	maxbpg	64	maxcontig 256	contigsumsize 16
% nbfree	944	ndir	2	nifree	75	nffree	0
% bpg	301	fpg	301	ipg	20
% nindir	64	inopb	2	maxfilesize	136353791
% sbsize	1536	cgsize	512	csaddr	171	cssize	512
% sblkno	144	cblkno	160	iblkno	161	dblkno	171
% cgrotor	0	fmod	0	ronly	0	clean	1
% avgfpdir 64	avgfilesize 16384
% flags	none
% fsmnt	/mnt
% volname		swuid	0

Note that sbsize is now larger than bsize.  Most of the things that
break involve wrong checks that sbsize <= bsize.  sbsize is not
limited by bsize in either direction, since the super block is
accessed in DEV_BSIZE-blocks, not bsize-blocks and the upper limit
on its size is not the same as upper limit on bsize.

> The good news is that it's a big improvement (I've done similar
> stuff in the past). You'll also want to tweak the -i (inode) value to
> insure that you have sufficient inodes for the number of files you plan
> to store. The default is not likely to be adequate for your needs.

Big is relative.  4K-blocks with 200-byte files gives a wastage factor
of 20.  Metadata alone will be 256 bytes for the inode alone with ffs2.
Only 128 bytes with ffs1.  Only 32 bytes with msdosfs.

> ...

But I expect using a file system would be so slow for lots of really
small files that I wouldn't try it.  Caching is already poor for
4K-files, and a factor of 20 loss won't improve it.  If you don't want
to use a database, maybe you can use tar.[gz] files.  These at least
reduce the wastage (but still waste about twice as much as msdosfs with
512 byte blocks), unless they are compressed.  I think there are ways
to treat tar files as file systems and to avoid reading the whole file
to find files in it (zip format is better for this).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120529161802.N975>