From owner-freebsd-fs@FreeBSD.ORG Thu Jul 21 20:07:21 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A832106566B; Thu, 21 Jul 2011 20:07:21 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id B94058FC1A; Thu, 21 Jul 2011 20:07:20 +0000 (UTC) Received: by wyg24 with SMTP id 24so1436271wyg.13 for ; Thu, 21 Jul 2011 13:07:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=93WVlszkRQgAUn5Em87RF0XcAtLrSuhlenWfL8FjdMw=; b=PkD9pN0ks6mPG/XNVNrs/DmWcLPpHLlG37eHHuknBlf48IAZfdFV4W/zPklOG1CtL3 V74ABD/t5DciBsYb1kI8pKyLDAs0ot7riqIcaHo9deCRHSnm6tl4d1qkdBOH5nCYFEQw KyWa0fwRlK1WVdA1SAgC/HcFIOP9hHJeG6K8U= MIME-Version: 1.0 Received: by 10.217.6.79 with SMTP id x57mr1147262wes.10.1311278839758; Thu, 21 Jul 2011 13:07:19 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.216.46.18 with HTTP; Thu, 21 Jul 2011 13:07:19 -0700 (PDT) In-Reply-To: References: Date: Thu, 21 Jul 2011 13:07:19 -0700 X-Google-Sender-Auth: Q2JPUJ3gMS3Nwm78ttZpT7tNqHA Message-ID: From: Artem Belevich To: Ivan Voras Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS and large directories - caveat report X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2011 20:07:21 -0000 On Thu, Jul 21, 2011 at 12:29 PM, Ivan Voras wrote: > On 21 July 2011 20:15, Artem Belevich wrote: >> On Thu, Jul 21, 2011 at 9:38 AM, Ivan Voras wrote: >>> On 21 July 2011 17:50, Freddie Cash wrote: >>>> On Thu, Jul 21, 2011 at 8:45 AM, Ivan Voras wrote= : >>>>> >>>>> Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. t= he >>>>> size of the metadata cache) >>>> >>>> vfs.zfs.arc_meta_limit >>>> >>>> This sets the amount of ARC that can be used for metadata.=A0 The defa= ult is >>>> 1/8th of ARC, I believe.=A0 This setting lets you use "primarycache=3D= all" >>>> (store metadata and file data in ARC) but then tune how much is used f= or >>>> each. >>>> >>>> Not sure if that will help in your case or not, but it's a sysctl you = can >>>> play with. >>> >>> I don't think that it works, or at least is not as efficient as dirhash= : >>> >>> www:~> sysctl -a | grep meta >>> kern.metadelay: 28 >>> vfs.zfs.mfu_ghost_metadata_lsize: 129082368 >>> vfs.zfs.mfu_metadata_lsize: 116224 >>> vfs.zfs.mru_ghost_metadata_lsize: 113958912 >>> vfs.zfs.mru_metadata_lsize: 16384 >>> vfs.zfs.anon_metadata_lsize: 0 >>> vfs.zfs.arc_meta_limit: 322412800 >>> vfs.zfs.arc_meta_used: 506907792 >>> kstat.zfs.misc.arcstats.demand_metadata_hits: 4471705 >>> kstat.zfs.misc.arcstats.demand_metadata_misses: 2110328 >>> kstat.zfs.misc.arcstats.prefetch_metadata_hits: 27 >>> kstat.zfs.misc.arcstats.prefetch_metadata_misses: 51 >>> >>> arc_meta_used is nearly 500 MB which should be enough even in this >>> case. With filenames of 32 characters, all the filenames alone for >>> 130,000 files in a directory take about 4 MB - I doubt the ZFS >>> introduces so much extra metadata it doesn't fit in 500 MB. >> >> For what it's worth, 500K files in one directory seems to work >> reasonably well on my box running few weeks old 8-stable (quad core >> 8GB RAM, ~6GB ARC), ZFSv28 pool on a 2-drive mirror + 50GB L2ARC. >> >> $ time perl -e 'use Fcntl; for $f =A0(1..500000) >> {sysopen(FH,"f$f",O_CREAT); close(FH);}' >> perl -e =A0>| /dev/null =A02.26s user 39.17s system 96% cpu 43.156 total >> >> $ time find . |wc -l >> =A0500001 >> find . =A00.16s user 0.33s system 99% cpu 0.494 total >> >> $ time find . -ls |wc -l >> =A0500001 >> find . -ls =A01.93s user 12.13s system 96% cpu 14.643 total >> >> time find . |xargs -n 100 rm >> find . =A00.22s user 0.28s system 0% cpu 2:45.12 total >> xargs -n 100 rm =A01.25s user 58.51s system 36% cpu 2:45.61 total >> >> Deleting files resulted in a constant stream of writes to hard drives. >> I guess file deletion may end up up being a synchronous write >> committed to ZIL right away. If that's indeed the case, small slog on >> SSD could probably speed up file deletion a bit. > > That's a very interesting find. > > Or maybe the issue is fragmentation: could you modify the script > slightly to create files in about 50 directories in parallel (i.e. > create in dir1, create in dir2, create in dir3... create in dir 50, > then again create in dir1, create in dir2...)? Scattering across 50 directories works about as fast: $ time perl -e 'use Fcntl; $dir =3D 0; for $f (1..500000) {sysopen(FH,"$dir/f$f",O_CREAT); close(FH); $dir=3D($dir+1) % 50}' >|/dev/null perl -e >| /dev/null 2.77s user 38.31s system 85% cpu 47.829 total $ time find . |wc -l 500051 find . 0.16s user 0.36s system 29% cpu 1.787 total $ time find . -ls |wc -l 500051 find . -ls 1.75s user 11.33s system 92% cpu 14.196 total $ time find . -name f\* | xargs -n 100 rm find . -name f\* 0.17s user 0.35s system 0% cpu 3:23.44 total xargs -n 100 rm 1.35s user 52.82s system 26% cpu 3:23.75 total > > Could you for the sake of curiosity upgrate this system to the latest > 8-stable and retry it? I'm currently running 8.2-STABLE r223055. The log does not show anything particularly interesting committed to ZFS code since then. There was LBOLT overflow fix, but it should not be relevant in this case. I do plan to upgrade the box, though it's not going to happen for another week or so. If the issue is still relevant then, I'll be happy to re-run the test. --Artem