Date: Thu, 21 Jul 2011 13:38:50 -0300 From: Luiz Otavio O Souza <lists.br@gmail.com> To: Ivan Voras <ivoras@freebsd.org> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS and large directories - caveat report Message-ID: <13577F3E-DE59-44F4-98F7-9587E26499B8@gmail.com> In-Reply-To: <j09hk8$svj$1@dough.gmane.org> References: <j09hk8$svj$1@dough.gmane.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 21, 2011, at 12:45 PM, Ivan Voras wrote: > I'm writing this mostly for future reference / archiving and also if = someone has an idea on how to improve the situation. >=20 > A web server I maintain was hit by DoS, which has caused more than 4 = million PHP session files to be created. The session files are sharded = in 32 directories in a single level - which is normally more than enough = for this web server as the number of users is only a couple of thousand. = With the DoS, the number of files per shard directory rose to about = 130,000. >=20 > The problem is: ZFS has proven horribly inefficient with such large = directories. I have other, more loaded servers with simlarly bad / large = directories on UFS where the problem is not nearly as serious as here = (probably due to the large dirhash). On this system, any operation which = touches even only the parent of these 32 shards (e.g. "ls") takes = seconds, and a simple "find | wc -l" on one of the shards takes > 30 = minutes (I stopped it after 30 minutes). Another symptom is that = SIGINT-ing such find process takes 10-15 seconds to complete (sic! this = likely means the kernel operation cannot be interrupted for so long). >=20 > This wouldn't be a problem by itself, but operations on such = directories eat IOPS - clearly visible with the "find" test case, making = the rest of the services on the server fall as collateral damage. = Apparently there is a huge amount of seeking being done, even though I = would think that for read operations all the data would be cached - and = somehow the seeking from this operation takes priority / livelocks other = operations on the same ZFS pool. >=20 > This is on a fresh 8-STABLE AMD64, pool version 28 and zfs version 5. >=20 > Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. = the size of the metadata cache) Hello Ivan, I've some kind of similar problems on a client that needs to store a = large amount of files. I have 4.194.303 (0x3fffff) files created on FS (unused files are = already created with zero size - this was a precaution from the UFS = times to avoid the 'no more free inodes on FS'). And I just break the files like mybasedir/3f/ff/ff, so under no = circumstance i have a 'big amount of files' in a single directory. The general usage on this server is fine, but the periodic (daily) = scripts take almost a day to complete and the server is slow as hell = while the daily scripts are running. All i need to do is kill 'find' to get the machine back to 'normal'. I did not stopped to look at it in detail, but the little bit i checked, = looks like the stat() calls takes a long time on ZFS files. Previously, we'd this running on UFS with a database of 16.777.215 = (0xffffff) files without any kind of trouble (i've reduced the database = size to keep the daily scripts run time under control). The periodic script is simply doing its job of verifying setuid files = (and comparing the list with the previous one). So, yes, i can confirm that running 'find' on a ZFS FS with a lot of = files is very, very slow (and looks like it isn't related to how the = files are distributed on the FS). But sorry, no idea about how to improve that situation (yet). Regards, Luiz
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?13577F3E-DE59-44F4-98F7-9587E26499B8>