FreeBSD Mail Archives

Date:      Thu, 21 Jul 2011 13:38:50 -0300
From:      Luiz Otavio O Souza <lists.br@gmail.com>
To:        Ivan Voras <ivoras@freebsd.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS and large directories - caveat report
Message-ID:  <13577F3E-DE59-44F4-98F7-9587E26499B8@gmail.com>
In-Reply-To: <j09hk8$svj$1@dough.gmane.org>
References:  <j09hk8$svj$1@dough.gmane.org>

On Jul 21, 2011, at 12:45 PM, Ivan Voras wrote:

> I'm writing this mostly for future reference / archiving and also if =
someone has an idea on how to improve the situation.
>=20
> A web server I maintain was hit by DoS, which has caused more than 4 =
million PHP session files to be created. The session files are sharded =
in 32 directories in a single level - which is normally more than enough =
for this web server as the number of users is only a couple of thousand. =
With the DoS, the number of files per shard directory rose to about =
130,000.
>=20
> The problem is: ZFS has proven horribly inefficient with such large =
directories. I have other, more loaded servers with simlarly bad / large =
directories on UFS where the problem is not nearly as serious as here =
(probably due to the large dirhash). On this system, any operation which =
touches even only the parent of these 32 shards (e.g. "ls") takes =
seconds, and a simple "find | wc -l" on one of the shards takes > 30 =
minutes (I stopped it after 30 minutes). Another symptom is that =
SIGINT-ing such find process takes 10-15 seconds to complete (sic! this =
likely means the kernel operation cannot be interrupted for so long).
>=20
> This wouldn't be a problem by itself, but operations on such =
directories eat IOPS - clearly visible with the "find" test case, making =
the rest of the services on the server fall as collateral damage. =
Apparently there is a huge amount of seeking being done, even though I =
would think that for read operations all the data would be cached - and =
somehow the seeking from this operation takes priority / livelocks other =
operations on the same ZFS pool.
>=20
> This is on a fresh 8-STABLE AMD64, pool version 28 and zfs version 5.
>=20
> Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. =
the size of the metadata cache)

Hello Ivan,

I've some kind of similar problems on a client that needs to store a =
large amount of files.

I have 4.194.303 (0x3fffff) files created on FS (unused files are =
already created with zero size - this was a precaution from the UFS =
times to avoid the 'no more free inodes on FS').

And I just break the files like mybasedir/3f/ff/ff, so under no =
circumstance i have a 'big amount of files' in a single directory.

The general usage on this server is fine, but the periodic (daily) =
scripts take almost a day to complete and the server is slow as hell =
while the daily scripts are running.

All i need to do is kill 'find' to get the machine back to 'normal'.

I did not stopped to look at it in detail, but the little bit i checked, =
looks like the stat() calls takes a long time on ZFS files.

Previously, we'd this running on UFS with a database of 16.777.215 =
(0xffffff) files without any kind of trouble (i've reduced the database =
size to keep the daily scripts run time under control).

The periodic script is simply doing its job of verifying setuid files =
(and comparing the list with the previous one).

So, yes, i can confirm that running 'find' on a ZFS FS with a lot of =
files is very, very slow (and looks like it isn't related to how the =
files are distributed on the FS).

But sorry, no idea about how to improve that situation (yet).

Regards,
Luiz

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?13577F3E-DE59-44F4-98F7-9587E26499B8>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation