From owner-freebsd-fs@FreeBSD.ORG Thu Jul 21 17:08:04 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7F27A106564A; Thu, 21 Jul 2011 17:08:04 +0000 (UTC) (envelope-from lists.br@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 2C0E88FC12; Thu, 21 Jul 2011 17:08:03 +0000 (UTC) Received: by gyf3 with SMTP id 3so887971gyf.13 for ; Thu, 21 Jul 2011 10:08:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=r3C/AxEMfCyBD8s47doW/3AodVlSSDCv8GFyeGz5Z38=; b=nzAKuk0I1gygzi2wvb0UZs/V0ad9rlKPyk0ZtDXx1O6BfCi46nmTAxAG7Da7wuzhKB 1eBpJuJUw8/nrcpkFsfzNQPQUBHfsCHzN/dlxOt0OBY7s0Ld8sqFFgzZOCOhC4tbRFFB NOALdlqpAmN1so4NDUwYw+/7N+bNnTbnpA7gU= Received: by 10.236.76.169 with SMTP id b29mr665848yhe.474.1311266333180; Thu, 21 Jul 2011 09:38:53 -0700 (PDT) Received: from [192.168.0.53] ([187.120.139.136]) by mx.google.com with ESMTPS id v4sm1270544yhm.48.2011.07.21.09.38.51 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 21 Jul 2011 09:38:52 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Luiz Otavio O Souza In-Reply-To: Date: Thu, 21 Jul 2011 13:38:50 -0300 Content-Transfer-Encoding: quoted-printable Message-Id: <13577F3E-DE59-44F4-98F7-9587E26499B8@gmail.com> References: To: Ivan Voras X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS and large directories - caveat report X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2011 17:08:04 -0000 On Jul 21, 2011, at 12:45 PM, Ivan Voras wrote: > I'm writing this mostly for future reference / archiving and also if = someone has an idea on how to improve the situation. >=20 > A web server I maintain was hit by DoS, which has caused more than 4 = million PHP session files to be created. The session files are sharded = in 32 directories in a single level - which is normally more than enough = for this web server as the number of users is only a couple of thousand. = With the DoS, the number of files per shard directory rose to about = 130,000. >=20 > The problem is: ZFS has proven horribly inefficient with such large = directories. I have other, more loaded servers with simlarly bad / large = directories on UFS where the problem is not nearly as serious as here = (probably due to the large dirhash). On this system, any operation which = touches even only the parent of these 32 shards (e.g. "ls") takes = seconds, and a simple "find | wc -l" on one of the shards takes > 30 = minutes (I stopped it after 30 minutes). Another symptom is that = SIGINT-ing such find process takes 10-15 seconds to complete (sic! this = likely means the kernel operation cannot be interrupted for so long). >=20 > This wouldn't be a problem by itself, but operations on such = directories eat IOPS - clearly visible with the "find" test case, making = the rest of the services on the server fall as collateral damage. = Apparently there is a huge amount of seeking being done, even though I = would think that for read operations all the data would be cached - and = somehow the seeking from this operation takes priority / livelocks other = operations on the same ZFS pool. >=20 > This is on a fresh 8-STABLE AMD64, pool version 28 and zfs version 5. >=20 > Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. = the size of the metadata cache) Hello Ivan, I've some kind of similar problems on a client that needs to store a = large amount of files. I have 4.194.303 (0x3fffff) files created on FS (unused files are = already created with zero size - this was a precaution from the UFS = times to avoid the 'no more free inodes on FS'). And I just break the files like mybasedir/3f/ff/ff, so under no = circumstance i have a 'big amount of files' in a single directory. The general usage on this server is fine, but the periodic (daily) = scripts take almost a day to complete and the server is slow as hell = while the daily scripts are running. All i need to do is kill 'find' to get the machine back to 'normal'. I did not stopped to look at it in detail, but the little bit i checked, = looks like the stat() calls takes a long time on ZFS files. Previously, we'd this running on UFS with a database of 16.777.215 = (0xffffff) files without any kind of trouble (i've reduced the database = size to keep the daily scripts run time under control). The periodic script is simply doing its job of verifying setuid files = (and comparing the list with the previous one). So, yes, i can confirm that running 'find' on a ZFS FS with a lot of = files is very, very slow (and looks like it isn't related to how the = files are distributed on the FS). But sorry, no idea about how to improve that situation (yet). Regards, Luiz