Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 May 2012 09:24:51 -0700
From:      Devin Teske <devin.teske@fisglobal.com>
To:        Alessio Focardi <alessiof@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Millions of small files: best filesystem / best options
Message-ID:  <922B261C-4AB8-49A9-96CE-16C98B265604@fisglobal.com>
In-Reply-To: <2134924725.5040.1338211317460.JavaMail.root@zimbra.interconnessioni.it>
References:  <2134924725.5040.1338211317460.JavaMail.root@zimbra.interconnessioni.it>

next in thread | previous in thread | raw e-mail | index | archive | help

On May 28, 2012, at 6:21 AM, Alessio Focardi wrote:

> Hi,
>=20
> I'm pretty new to BSD, but I do have some knowledge in Linux.=20
>=20
> I'm looking for some advice to efficiently pack millions of small files (=
200 bytes or less) over a freebsd fs.
>=20

This is something we've been doing (on FreeBSD) for almost 15 years now (st=
arting with FreeBSD 2.1.5; now 8.1, and soon 8.3). We started with UFS1 and=
 have been evaluating ZFS (we don't think SU+J is ready for production at t=
his scale yet). We haven't used UFS2 yet but have no doubt that it's just a=
s strong as UFS1.


> Those files will be stored in an hierarchical directory structure to limi=
t the number of files for any directory and so (I hope!) speed up file look=
ups/deletion.
>=20

FreeBSD handles this wonderfully thanks to all the people that have put in =
time and effort over the years.

Ten years ago (circa FreeBSD 4.0-RELEASE) people at the company I work at n=
ow, back then commonly:
- fiddled with the dirhash sysctl(8) MIB
- modified fsck(8) to make it more efficient
- modified tar(1) to handle high numbers of hard-links without falling over
- modified du(1) in a similar fashion to tar above
- more; all in the name of doing what you're describing (but on steroids)

but all those patches eventually made their way back into FreeBSD and we ge=
nerally haven't had to worry about even tens-of-millions of JPEG-sized (~20=
0KB) files on a RAID formatted in UFS (1 or 2) since, say, FreeBSD-6 (but s=
omeone in FS will be able to give a more accurate release when things reall=
y started to stabilize). Either way, 6, 7, 8, and 9 all had very stable fil=
esystems w/respect to millions-of-small-files.



> I have to say that I'm looking at fbsd for my project because both UFS2 a=
nd ZFS have some flavour of "block suballocation" "tail packing" "variable =
record size", at least documentation says so.
>=20
> My hope is to waste as less space as possible, even sacrificing some spee=
d: can't use a full block for a single file: I will end up wasting 99% of t=
he space!
>=20

I wasn't aware that FreeBSD was unique in this respect, but yes, FreeBSD ha=
s a block size and a fragment size. While formatting a UFS filesystem you c=
an specify these sizes with the "-b SIZE" and "-f SIZE" arguments to newfs(=
8), for example:

	newfs -b 16384 -f 2048 /dev/da0s1a

Will format a RAID (/dev/da0s1a) with a 16K block size but a 2K fragment si=
ze. Using touch(1) to create an empty file will use only 2K of disk space. =
This is the "block suballocation" you speak of. The above parameters are ex=
actly what we use formatting our RAIDs when storing millions of JPEG-sized =
(~200KB as you describe) files.


>=20
> Do someone got some experience in a similar situation, and it's willing t=
o give some advice on which fs I should choose and how to tune it for this =
particular scenario?
>=20

Choose your hardware wisely. After you have chosen your hardware wisely, se=
t it up even more wisely.

For example, we go threw a multi-day burn-in process on RAIDs that have dou=
ble-digit numbers of disks.

Be smart about how you allocate the logical versus physical media in a way =
that reduces bottlenecks.

Go through any/all failure/recovery test procedures before putting data on =
the device if you don't already trust the hardware. Trust in the hardware i=
s very important. If you don't trust your hardware's battery backed DIMM fo=
r write-back cache (for example), then I have one very important recommenda=
tion when it comes to UFS: disable the SoftUpdates feature.

Disabling SoftUpdates on a UFS filesystem cause a huge performance impact b=
ut it will allow you to sleep at night. In 15 years, UFS has never barfed o=
n us unless maybe 3 memorable events in which entire groups-of-individuals =
can recount with amazing clarity debugging horked filesystems late in the n=
ight after SoftUpdates ate the kid's homework (leaving tens- to hundreds-of=
-thousands of files in lost+found). We routinely use SoftUpdates on _other_=
 UFS filesystems (like system partitions including "/var" and "/usr"), but =
_never_ on the RAIDs housing those millions-of-little-files.

Other's mileage may vary.

>=20
> Thank you very much, appreciated!
>=20
>=20

No problem.


> ps
>=20
> I know that probably a database will fit better in this situation, but in=
 my case I can't take that route :(
>=20

Not necessarily. A database has the immediate-and-clear down-side that if o=
ne bit in the database changes, a backup tool like bacula has to backup the=
 entire database again.

=85and the database administrator is not necessarily the same person as the=
 backup administrator (just sayin').
--=20
Devin

_____________
The information contained in this message is proprietary and/or confidentia=
l. If you are not the intended recipient, please: (i) delete the message an=
d all copies; (ii) do not disclose, distribute or use the message in any ma=
nner; and (iii) notify the sender immediately. In addition, please be aware=
 that any message addressed to our domain is subject to archiving and revie=
w by persons other than the intended recipient. Thank you.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?922B261C-4AB8-49A9-96CE-16C98B265604>