Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 09 Oct 2014 15:36:16 +0300
From:      Paul <devgs@ukr.net>
To:        freebsd-fs@freebsd.org
Subject:   Question about metadata in ARC
Message-ID:  <1412858175.581495697.syuqcm4o@frv38.fwdcdn.com>

next in thread | raw e-mail | index | archive | help
Cheers.

Recently on production servers we have discovered strange ARC behavior.

Up until now we weren't using readdir() too ofter, only on occasion.
Now some of our daemons periodically call readdir() on many thousands of folders.
On average a time between two readdir() calls on same directory is small: ~10-15 seconds.
But for some reason 99% of the calls miss the cache. 
Directories never have more than 3 files in them.
One of them, the lock file, is never removed and stays on file system. 
Other one or two files almost immediately unliked, after being created.
Scan of directory using readdir(), before unlinking those files always takes roughly 10-12 milliseconds.
I have figured it's because directory metadata is getting pumped out of ARC.
The question is why so quickly? Why is on the other hand, metadata needed for stat() stays in cache 
much much longer. Literally hours. stat()-ing file once in few hours always hits the cache.
And we stat() millions of dirrefent files per day.

So I imagine there is two kinds of metadata: directory data blocks (hash table, according to wiki)
and data blocks where stat()'s metadata is stored. 

Why is stat() metadata lives so much longer than directory metadata?
Or better say, why is directory metadata rejected so quickly?
Is there a way to configure it otherwise?


I want to show you my little test case. 

Environment:
The test is performed on production server with 128G RAM and max ARC size set to 90G
We are running FreeBSD 11.0-CURRENT #3 r260625
Stats from top are:
ARC: 86G Total, 1884M MFU, 77G MRU, 54M Anon, 2160M Header, 5149M Other
Average disk busyness is 40%
6 CPU cores with hypertherading (12 virtual cores) are 25% busy on average

Setup:
To setup test case I have created test directory and spawned 5000 files using:
# for a in {1..5000}; do touch ${RANDOM}_test_file_name_${RANDOM}_$a; done

And saved their names to temporary file.
# ls > /tmp/testfiles


Testing:
Then I waited an hour (for cached metadata to expire from cache) and did two things:

1) scan directory using plain ls
# time ls >| /dev/null;
ls -G >| /dev/null  0,03s user 0,05s system 5% cpu 1,472 total

2) stat files by their names, that were retrieved from temporary file created earlier
# time stat `cat /tmp/lstest`>| /dev/null
stat `cat /tmp/lstest` >| /dev/null  0,16s user 0,19s system 99% cpu 1,481 total

Then I waited one minute and repeated two above actions:

1) # time ls >| /dev/null;
ls -G >| /dev/null  0,03s user 0,04s system 5% cpu 1,327 total

2) # time stat `cat /tmp/lstest`>| /dev/null
stat `cat /tmp/lstest` >| /dev/null  0,16s user 0,19s system 99% cpu 0,351 total

As you can see in case (1) majority of time is waiting for disk ie cache miss.
In case (2) CPU time is in majority.

I did many more experiments and came to conclusion that directory metadata is removed from cache almost immediately.
Sometimes it takes 5 seconds, sometimes it's even 1 second, rarely it's 10 or more seconds.
While on the other hand when I ran stat `cat /tmp/lstest` hours later I still had total time around 350ms.


So, how can I configure ZFS to reduce cache misses when reading directories?


There is another problem I think related to my issue. 
Periodical unlink() of non-existent files also takes 10 to 12 milliseconds.
While stat() + unlink() (if file exists) always takes no more than tens of microseconds!


Paul,
Thanks.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1412858175.581495697.syuqcm4o>