Date: Wed, 8 Jun 2016 11:14:40 +0300 From: Ciprian Dorin Craciun <ciprian.craciun@gmail.com> To: freebsd-questions@freebsd.org Subject: Feedback on UFS2 tuning for large number of small files (~100m) Message-ID: <CA%2BTk8fyZjdvb70HFfwJBD=%2BJ4PU9Ae5FcsaQgSvMZW5B2T3YLA@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hello all! (Please keep me in CC as I'm not subscribed on the mailing list. Should I perhaps post this to the `freebsd-fs` mailing list?) I would like your feedback on tuning a UFS2 file-system for the following use-case, which is very similar to a maildir mail server. I tried to look for hints on the internet, but found nothing more in-depth than enabling soft-updates, `noatime`, etc. The main usage of the file-system is: * there are 4 separate files stores, each with about 50 million files, all on the same partition; * all of the 4 file stores have a dispersed layout on two levels (i.e. `XX/YY/ZZ...`, where `ZZ...` is a 64 hexadecimal string); (as a consequence there shouldn't be more than one thousand files per leaf folder;) * all of the files above are around 2-3 KiB; * these files are read-mostly, and they are never deleted; * there is almost no access contention, neither read or write; * there are 4 matching "queue" stores, dispersed on a single level, containing symlinks; * each symlink points to a path roughly 100-200 characters in length; * I wouldn't expect more than a few thousand files for each store; * the symlinks are constantly `rename`-d-in and `rename`-d-out in-and-out of these folders; * these folders are constantly listed, by 4-32 parallel processes (not multi-threaded); * (basically I use stores to emulate a queuing system, and I'm careful that each process tries randomly the leaf folders, thus reducing contention; and also pausing if the queue "seems" empty;) As sidenotes: * the partition is backed by two mirrored disks (which I'm assuming are rotating SCSI disks); * persistence in case of power or system failure (i.e. files getting truncated or missing) is not so critical for my use-case; * however file-system consistency on failure (i.e. getting a correct mounted file-system) is important, thus from what I've read from the `mount` man-page, `async` is not an option; * the system has plenty of RAM (32 GiB), however it is constantly under 100% CPU load by processes on nice level 10; * this system is dedicated to the task at hand, therefore there is no other background contention; The problem that prompted me to ask the community for feedback is that under load (i.e. 100% CPU usage by processes on nice level 10), even listing the file-system seems to stall, ranging from a fraction of second up to a few seconds. The output of `iostat -w 30 -d -C -x -I` under load is (the values are cumulated per 30 seconds, thus not average per second): ~~~~ device r/i w/i kr/i kw/i qlen tsvc_t/i sb/i us ni sy in id ada0 1243893.0 4988740.0 6447101.5 311428382.5 600 812579.1 8698.9 0 0 0 0 100 ada1 1243889.0 4988824.0 6429851.0 311428550.5 520 766389.6 8437.3 device r/i w/i kr/i kw/i qlen tsvc_t/i sb/i us ni sy in id ada0 582.0 12510.0 2328.0 152986.5 383 9463.4 28.9 0 3 1 0 96 ada1 587.0 12465.0 2348.0 152806.5 343 9107.8 28.7 device r/i w/i kr/i kw/i qlen tsvc_t/i sb/i us ni sy in id ada0 792.0 12933.0 3168.0 157643.5 542 11178.8 29.1 0 3 1 0 96 ada1 791.0 12893.0 3164.0 157651.5 544 10591.2 28.5 ~~~~ The file-system is mounted with the following options: ~~~~ ufs rw,noatime ~~~~ The `dumpefs` of the file-system outputs the following: ~~~~ magic 19540119 (UFS2) time Sat Jun 4 05:59:23 2016 superblock location 65536 id [ 56cb7a3f 33fd7a56 ] ncg 2897 size 464257019 blocks 449679279 bsize 32768 shift 15 mask 0xffff8000 fsize 4096 shift 12 mask 0xfffff000 frag 8 shift 3 fsbtodb 3 minfree 8% optim time symlinklen 120 maxbsize 32768 maxbpg 4096 maxcontig 4 contigsumsize 4 nbfree 56167793 ndir 265137 nifree 232205846 nffree 9111 bpg 20035 fpg 160280 ipg 80256 unrefs 0 nindir 4096 inopb 128 maxfilesize 2252349704110079 sbsize 4096 cgsize 32768 csaddr 5056 cssize 49152 sblkno 24 cblkno 32 iblkno 40 dblkno 5056 cgrotor 0 fmod 0 ronly 0 clean 0 metaspace 6408 avgfpdir 64 avgfilesize 16384 flags soft-updates+journal fsmnt /some-path volname swuid 0 providersize 464257019 ~~~~ Thus I would like to ask the community what I can tune (even by re-formatting) to make it more "responsive", and alternatively I am open to another file-system type, perhaps more suited for this use-case. Thanks, Ciprian.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BTk8fyZjdvb70HFfwJBD=%2BJ4PU9Ae5FcsaQgSvMZW5B2T3YLA>