Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Jun 2016 11:14:40 +0300
From:      Ciprian Dorin Craciun <ciprian.craciun@gmail.com>
To:        freebsd-questions@freebsd.org
Subject:   Feedback on UFS2 tuning for large number of small files (~100m)
Message-ID:  <CA%2BTk8fyZjdvb70HFfwJBD=%2BJ4PU9Ae5FcsaQgSvMZW5B2T3YLA@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hello all!  (Please keep me in CC as I'm not subscribed on the mailing
list.  Should I perhaps post this to the `freebsd-fs` mailing list?)


I would like your feedback on tuning a UFS2 file-system for the
following use-case, which is very similar to a maildir mail server.  I
tried to look for hints on the internet, but found nothing more
in-depth than enabling soft-updates, `noatime`, etc.




The main usage of the file-system is:

* there are 4 separate files stores, each with about 50 million files,
all on the same partition;
* all of the 4 file stores have a dispersed layout on two levels (i.e.
`XX/YY/ZZ...`, where `ZZ...` is a 64 hexadecimal string);  (as a
consequence there shouldn't be more than one thousand files per leaf
folder;)
* all of the files above are around 2-3 KiB;
* these files are read-mostly, and they are never deleted;
* there is almost no access contention, neither read or write;

* there are 4 matching "queue" stores, dispersed on a single level,
containing symlinks;
* each symlink points to a path roughly 100-200 characters in length;
* I wouldn't expect more than a few thousand files for each store;
* the symlinks are constantly `rename`-d-in and `rename`-d-out
in-and-out of these folders;
* these folders are constantly listed, by 4-32 parallel processes (not
multi-threaded);
* (basically I use stores to emulate a queuing system, and I'm careful
that each process tries randomly the leaf folders, thus reducing
contention;  and also pausing if the queue "seems" empty;)


As sidenotes:

* the partition is backed by two mirrored disks (which I'm assuming
are rotating SCSI disks);
* persistence in case of power or system failure (i.e. files getting
truncated or missing) is not so critical for my use-case;
* however file-system consistency on failure (i.e. getting a correct
mounted file-system) is important, thus from what I've read from the
`mount` man-page, `async` is not an option;
* the system has plenty of RAM (32 GiB), however it is constantly
under 100% CPU load by processes on nice level 10;
* this system is dedicated to the task at hand, therefore there is no
other background contention;




The problem that prompted me to ask the community for feedback is that
under load (i.e. 100% CPU usage by processes on nice level 10), even
listing the file-system seems to stall, ranging from a fraction of
second up to a few seconds.

The output of `iostat -w 30 -d -C -x -I` under load is (the values are
cumulated per 30 seconds, thus not average per second):
~~~~
device           r/i         w/i         kr/i         kw/i qlen
tsvc_t/i      sb/i  us ni sy in id
ada0       1243893.0   4988740.0    6447101.5  311428382.5  600
812579.1    8698.9   0  0  0  0 100
ada1       1243889.0   4988824.0    6429851.0  311428550.5  520
766389.6    8437.3


device           r/i         w/i         kr/i         kw/i qlen
tsvc_t/i      sb/i  us ni sy in id
ada0           582.0     12510.0       2328.0     152986.5  383
9463.4      28.9   0  3  1  0 96
ada1           587.0     12465.0       2348.0     152806.5  343
9107.8      28.7

device           r/i         w/i         kr/i         kw/i qlen
tsvc_t/i      sb/i  us ni sy in id
ada0           792.0     12933.0       3168.0     157643.5  542
11178.8      29.1   0  3  1  0 96
ada1           791.0     12893.0       3164.0     157651.5  544
10591.2      28.5
~~~~




The file-system is mounted with the following options:
~~~~
ufs     rw,noatime
~~~~


The `dumpefs` of the file-system outputs the following:
~~~~
magic   19540119 (UFS2) time    Sat Jun  4 05:59:23 2016
superblock location     65536   id      [ 56cb7a3f 33fd7a56 ]
ncg     2897    size    464257019       blocks  449679279
bsize   32768   shift   15      mask    0xffff8000
fsize   4096    shift   12      mask    0xfffff000
frag    8       shift   3       fsbtodb 3
minfree 8%      optim   time    symlinklen 120
maxbsize 32768  maxbpg  4096    maxcontig 4     contigsumsize 4
nbfree  56167793        ndir    265137  nifree  232205846       nffree  9111
bpg     20035   fpg     160280  ipg     80256   unrefs  0
nindir  4096    inopb   128     maxfilesize     2252349704110079
sbsize  4096    cgsize  32768   csaddr  5056    cssize  49152
sblkno  24      cblkno  32      iblkno  40      dblkno  5056
cgrotor 0       fmod    0       ronly   0       clean   0
metaspace 6408  avgfpdir 64     avgfilesize 16384
flags   soft-updates+journal
fsmnt   /some-path
volname         swuid   0       providersize    464257019
~~~~




Thus I would like to ask the community what I can tune (even by
re-formatting) to make it more "responsive", and alternatively I am
open to another file-system type, perhaps more suited for this
use-case.


Thanks,
Ciprian.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BTk8fyZjdvb70HFfwJBD=%2BJ4PU9Ae5FcsaQgSvMZW5B2T3YLA>