Date: Thu, 21 Jul 2011 13:07:19 -0700 From: Artem Belevich <art@freebsd.org> To: Ivan Voras <ivoras@freebsd.org> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS and large directories - caveat report Message-ID: <CAFqOu6itZYfVGiqs4Kedprcj8oVePtbkn9Q5BjYM8w26%2BBzFNA@mail.gmail.com> In-Reply-To: <CAF-QHFUKuSiek3pbUGLgdcZDAiPOnGJegbXvLB9Ah18N%2BffyRA@mail.gmail.com> References: <j09hk8$svj$1@dough.gmane.org> <CAOjFWZ7x7AM1BvR0KBWM4669rtMgcBBt%2BBP96RppvtSP_gRhJg@mail.gmail.com> <CAF-QHFVJpFkMO94SFg403v-mNGRbn8soih67gUrujnzC=tSPFA@mail.gmail.com> <CAFqOu6j=butqyUb3WLr=07JKPQ1HQZkL-58gqKA%2BzHgm7TBvPQ@mail.gmail.com> <CAF-QHFUKuSiek3pbUGLgdcZDAiPOnGJegbXvLB9Ah18N%2BffyRA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 21, 2011 at 12:29 PM, Ivan Voras <ivoras@freebsd.org> wrote: > On 21 July 2011 20:15, Artem Belevich <art@freebsd.org> wrote: >> On Thu, Jul 21, 2011 at 9:38 AM, Ivan Voras <ivoras@freebsd.org> wrote: >>> On 21 July 2011 17:50, Freddie Cash <fjwcash@gmail.com> wrote: >>>> On Thu, Jul 21, 2011 at 8:45 AM, Ivan Voras <ivoras@freebsd.org> wrote= : >>>>> >>>>> Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. t= he >>>>> size of the metadata cache) >>>> >>>> vfs.zfs.arc_meta_limit >>>> >>>> This sets the amount of ARC that can be used for metadata.=A0 The defa= ult is >>>> 1/8th of ARC, I believe.=A0 This setting lets you use "primarycache=3D= all" >>>> (store metadata and file data in ARC) but then tune how much is used f= or >>>> each. >>>> >>>> Not sure if that will help in your case or not, but it's a sysctl you = can >>>> play with. >>> >>> I don't think that it works, or at least is not as efficient as dirhash= : >>> >>> www:~> sysctl -a | grep meta >>> kern.metadelay: 28 >>> vfs.zfs.mfu_ghost_metadata_lsize: 129082368 >>> vfs.zfs.mfu_metadata_lsize: 116224 >>> vfs.zfs.mru_ghost_metadata_lsize: 113958912 >>> vfs.zfs.mru_metadata_lsize: 16384 >>> vfs.zfs.anon_metadata_lsize: 0 >>> vfs.zfs.arc_meta_limit: 322412800 >>> vfs.zfs.arc_meta_used: 506907792 >>> kstat.zfs.misc.arcstats.demand_metadata_hits: 4471705 >>> kstat.zfs.misc.arcstats.demand_metadata_misses: 2110328 >>> kstat.zfs.misc.arcstats.prefetch_metadata_hits: 27 >>> kstat.zfs.misc.arcstats.prefetch_metadata_misses: 51 >>> >>> arc_meta_used is nearly 500 MB which should be enough even in this >>> case. With filenames of 32 characters, all the filenames alone for >>> 130,000 files in a directory take about 4 MB - I doubt the ZFS >>> introduces so much extra metadata it doesn't fit in 500 MB. >> >> For what it's worth, 500K files in one directory seems to work >> reasonably well on my box running few weeks old 8-stable (quad core >> 8GB RAM, ~6GB ARC), ZFSv28 pool on a 2-drive mirror + 50GB L2ARC. >> >> $ time perl -e 'use Fcntl; for $f =A0(1..500000) >> {sysopen(FH,"f$f",O_CREAT); close(FH);}' >> perl -e =A0>| /dev/null =A02.26s user 39.17s system 96% cpu 43.156 total >> >> $ time find . |wc -l >> =A0500001 >> find . =A00.16s user 0.33s system 99% cpu 0.494 total >> >> $ time find . -ls |wc -l >> =A0500001 >> find . -ls =A01.93s user 12.13s system 96% cpu 14.643 total >> >> time find . |xargs -n 100 rm >> find . =A00.22s user 0.28s system 0% cpu 2:45.12 total >> xargs -n 100 rm =A01.25s user 58.51s system 36% cpu 2:45.61 total >> >> Deleting files resulted in a constant stream of writes to hard drives. >> I guess file deletion may end up up being a synchronous write >> committed to ZIL right away. If that's indeed the case, small slog on >> SSD could probably speed up file deletion a bit. > > That's a very interesting find. > > Or maybe the issue is fragmentation: could you modify the script > slightly to create files in about 50 directories in parallel (i.e. > create in dir1, create in dir2, create in dir3... create in dir 50, > then again create in dir1, create in dir2...)? Scattering across 50 directories works about as fast: $ time perl -e 'use Fcntl; $dir =3D 0; for $f (1..500000) {sysopen(FH,"$dir/f$f",O_CREAT); close(FH); $dir=3D($dir+1) % 50}' >|/dev/null perl -e >| /dev/null 2.77s user 38.31s system 85% cpu 47.829 total $ time find . |wc -l 500051 find . 0.16s user 0.36s system 29% cpu 1.787 total $ time find . -ls |wc -l 500051 find . -ls 1.75s user 11.33s system 92% cpu 14.196 total $ time find . -name f\* | xargs -n 100 rm find . -name f\* 0.17s user 0.35s system 0% cpu 3:23.44 total xargs -n 100 rm 1.35s user 52.82s system 26% cpu 3:23.75 total > > Could you for the sake of curiosity upgrate this system to the latest > 8-stable and retry it? I'm currently running 8.2-STABLE r223055. The log does not show anything particularly interesting committed to ZFS code since then. There was LBOLT overflow fix, but it should not be relevant in this case. I do plan to upgrade the box, though it's not going to happen for another week or so. If the issue is still relevant then, I'll be happy to re-run the test. --Artem
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFqOu6itZYfVGiqs4Kedprcj8oVePtbkn9Q5BjYM8w26%2BBzFNA>