Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 May 2012 01:20:12 -0700
From:      Doug Barton <dougb@FreeBSD.org>
To:        Don Lewis <truckman@FreeBSD.org>
Cc:        freebsd-fs@FreeBSD.org
Subject:   Re: Millions of small files: best filesystem / best options
Message-ID:  <4FC486BC.3050808@FreeBSD.org>
In-Reply-To: <201205290806.q4T86K8M007099@gw.catspoiler.org>
References:  <201205290806.q4T86K8M007099@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 5/29/2012 1:06 AM, Don Lewis wrote:
> On 29 May, Bruce Evans wrote:
>> On Mon, 28 May 2012, Doug Barton wrote:
>>
>>> On 5/28/2012 10:01 AM, Alessio Focardi wrote:
>>>> So in my case I would have to use -b 4096 -f 512
>>>>
>>>> It's an improvement, but still is not ideal: still a big waste with 200 bytes files!
>>>
>>> Are all of the files exactly 200 bytes? If so that's likely the best you
>>> can do.
>>
>> It is easy to do better by using a file system that supports small block
>> sizes.  This might be slow, but it reduces the wastage.  Possible file
>> systems:
> 
>> - it is easy to fix ffs to support a minimum block size of 512 (by
>>    reducing its gratuitous limit of MINBSIZE and fixing the few things
>>    that break:
> 
> That shouldn't be necessary, especially if you newfs with the "-o space"
> option to force the fragments for multiple files to be allocated out of
> the same block right from the start unstead of waiting to do this once
> the filesystem starts getting full.
> 
> I ran a Usenet server this way for quite a while with fairly good
> results, though the average file size was a bit bigger, about 2K or so.
> I found that if I didn't use "-o space" that space optimization wouldn't
> kick in soon enough and I'd tend to run out of full blocks that would be
> needed for larger files.  The biggest performance problem that I ran
> into was that as the directories shrank and grew, they would tend to get
> badly fragmented, causing lookups to get slow.  This was in the days
> before dirhash ...

Yeah, I started to write something about that, and stopped because I was
afraid I was getting too far in the weeds, and the strategy used
nowadays is so much better by default.

If, as it was in my case, the model is almost exclusively write-once
read-many, using -o space as the default works really well. I don't
recall the OP well enough to know if that's what's happening here or not.

It would be really cool if there was a way to tell the filesystem to do
something in between ... like in this case shove 2 files into a 4k block
so that performance is better, but it won't get too badly fragmented if
the files have to grow.

Doug

-- 

    This .signature sanitized for your protection



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FC486BC.3050808>