From owner-freebsd-fs@FreeBSD.ORG Thu Jul 21 15:46:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 556A71065675 for ; Thu, 21 Jul 2011 15:46:38 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id DADF78FC23 for ; Thu, 21 Jul 2011 15:46:37 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QjvSK-0001Q4-Qn for freebsd-fs@freebsd.org; Thu, 21 Jul 2011 17:46:32 +0200 Received: from cpe-188-129-82-57.dynamic.amis.hr ([188.129.82.57]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 21 Jul 2011 17:46:32 +0200 Received: from ivoras by cpe-188-129-82-57.dynamic.amis.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 21 Jul 2011 17:46:32 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Thu, 21 Jul 2011 17:45:53 +0200 Lines: 31 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: cpe-188-129-82-57.dynamic.amis.hr User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11 Subject: ZFS and large directories - caveat report X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2011 15:46:38 -0000 I'm writing this mostly for future reference / archiving and also if someone has an idea on how to improve the situation. A web server I maintain was hit by DoS, which has caused more than 4 million PHP session files to be created. The session files are sharded in 32 directories in a single level - which is normally more than enough for this web server as the number of users is only a couple of thousand. With the DoS, the number of files per shard directory rose to about 130,000. The problem is: ZFS has proven horribly inefficient with such large directories. I have other, more loaded servers with simlarly bad / large directories on UFS where the problem is not nearly as serious as here (probably due to the large dirhash). On this system, any operation which touches even only the parent of these 32 shards (e.g. "ls") takes seconds, and a simple "find | wc -l" on one of the shards takes > 30 minutes (I stopped it after 30 minutes). Another symptom is that SIGINT-ing such find process takes 10-15 seconds to complete (sic! this likely means the kernel operation cannot be interrupted for so long). This wouldn't be a problem by itself, but operations on such directories eat IOPS - clearly visible with the "find" test case, making the rest of the services on the server fall as collateral damage. Apparently there is a huge amount of seeking being done, even though I would think that for read operations all the data would be cached - and somehow the seeking from this operation takes priority / livelocks other operations on the same ZFS pool. This is on a fresh 8-STABLE AMD64, pool version 28 and zfs version 5. Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. the size of the metadata cache)