From owner-freebsd-fs@FreeBSD.ORG Thu Jul 21 19:29:49 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A791106566C; Thu, 21 Jul 2011 19:29:49 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 03E7E8FC13; Thu, 21 Jul 2011 19:29:48 +0000 (UTC) Received: by gyf3 with SMTP id 3so963204gyf.13 for ; Thu, 21 Jul 2011 12:29:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=BI6C32Gpfx8Up/Tp5DkW5eInHWp1lU9+ZobRBCHWxho=; b=aoalgSvmk5Uru06UJl2QS0j/Xd3rPfhvyGwZvYRYfk/0mBXgt2fh3k/kyXVJTvJGaP pluKoy/40au0cFgwYkCTrbGUn3HztUBjENoWdR7HM9LqI7/Elf0Sw0Zo5UY2KuZENyyN LaUW+NSdfSBy9PlzltsB2E5uY5s5W8zdK7dl4= Received: by 10.101.18.6 with SMTP id v6mr732356ani.39.1311276588174; Thu, 21 Jul 2011 12:29:48 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.100.198.5 with HTTP; Thu, 21 Jul 2011 12:29:07 -0700 (PDT) In-Reply-To: References: From: Ivan Voras Date: Thu, 21 Jul 2011 21:29:07 +0200 X-Google-Sender-Auth: BuLkf_WYk6p9AzB33n_bWy3pdgE Message-ID: To: Artem Belevich Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS and large directories - caveat report X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2011 19:29:49 -0000 On 21 July 2011 20:15, Artem Belevich wrote: > On Thu, Jul 21, 2011 at 9:38 AM, Ivan Voras wrote: >> On 21 July 2011 17:50, Freddie Cash wrote: >>> On Thu, Jul 21, 2011 at 8:45 AM, Ivan Voras wrote: >>>> >>>> Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. th= e >>>> size of the metadata cache) >>> >>> vfs.zfs.arc_meta_limit >>> >>> This sets the amount of ARC that can be used for metadata.=C2=A0 The de= fault is >>> 1/8th of ARC, I believe.=C2=A0 This setting lets you use "primarycache= =3Dall" >>> (store metadata and file data in ARC) but then tune how much is used fo= r >>> each. >>> >>> Not sure if that will help in your case or not, but it's a sysctl you c= an >>> play with. >> >> I don't think that it works, or at least is not as efficient as dirhash: >> >> www:~> sysctl -a | grep meta >> kern.metadelay: 28 >> vfs.zfs.mfu_ghost_metadata_lsize: 129082368 >> vfs.zfs.mfu_metadata_lsize: 116224 >> vfs.zfs.mru_ghost_metadata_lsize: 113958912 >> vfs.zfs.mru_metadata_lsize: 16384 >> vfs.zfs.anon_metadata_lsize: 0 >> vfs.zfs.arc_meta_limit: 322412800 >> vfs.zfs.arc_meta_used: 506907792 >> kstat.zfs.misc.arcstats.demand_metadata_hits: 4471705 >> kstat.zfs.misc.arcstats.demand_metadata_misses: 2110328 >> kstat.zfs.misc.arcstats.prefetch_metadata_hits: 27 >> kstat.zfs.misc.arcstats.prefetch_metadata_misses: 51 >> >> arc_meta_used is nearly 500 MB which should be enough even in this >> case. With filenames of 32 characters, all the filenames alone for >> 130,000 files in a directory take about 4 MB - I doubt the ZFS >> introduces so much extra metadata it doesn't fit in 500 MB. > > For what it's worth, 500K files in one directory seems to work > reasonably well on my box running few weeks old 8-stable (quad core > 8GB RAM, ~6GB ARC), ZFSv28 pool on a 2-drive mirror + 50GB L2ARC. > > $ time perl -e 'use Fcntl; for $f =C2=A0(1..500000) > {sysopen(FH,"f$f",O_CREAT); close(FH);}' > perl -e =C2=A0>| /dev/null =C2=A02.26s user 39.17s system 96% cpu 43.156 = total > > $ time find . |wc -l > =C2=A0500001 > find . =C2=A00.16s user 0.33s system 99% cpu 0.494 total > > $ time find . -ls |wc -l > =C2=A0500001 > find . -ls =C2=A01.93s user 12.13s system 96% cpu 14.643 total > > time find . |xargs -n 100 rm > find . =C2=A00.22s user 0.28s system 0% cpu 2:45.12 total > xargs -n 100 rm =C2=A01.25s user 58.51s system 36% cpu 2:45.61 total > > Deleting files resulted in a constant stream of writes to hard drives. > I guess file deletion may end up up being a synchronous write > committed to ZIL right away. If that's indeed the case, small slog on > SSD could probably speed up file deletion a bit. That's a very interesting find. Or maybe the issue is fragmentation: could you modify the script slightly to create files in about 50 directories in parallel (i.e. create in dir1, create in dir2, create in dir3... create in dir 50, then again create in dir1, create in dir2...)? Could you for the sake of curiosity upgrate this system to the latest 8-stable and retry it?