Date: Tue, 29 Jan 2013 18:06:01 -0600 From: Kevin Day <toasty@dragondata.com> To: Matthew Ahrens <mahrens@delphix.com> Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org> Subject: Re: Improving ZFS performance for large directories Message-ID: <F4420A8C-FB92-4771-B261-6C47A736CF7F@dragondata.com> In-Reply-To: <CAJjvXiE%2B8OMu_yvdRAsWugH7W=fhFW7bicOLLyjEn8YrgvCwiw@mail.gmail.com> References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com> <CAJjvXiE%2B8OMu_yvdRAsWugH7W=fhFW7bicOLLyjEn8YrgvCwiw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jan 29, 2013, at 5:42 PM, Matthew Ahrens <mahrens@delphix.com> wrote: > On Tue, Jan 29, 2013 at 3:20 PM, Kevin Day <toasty@dragondata.com> = wrote: > I'm prepared to try an L2arc cache device (with = secondarycache=3Dmetadata), >=20 > You might first see how long it takes when everything is cached. E.g. = by doing this in the same directory several times. This will give you a = lower bound on the time it will take (or put another way, an upper bound = on the improvement available from a cache device). > =20 Doing it twice back-to-back makes a bit of difference but it's still = slow either way. After not touching this directory for about 30 minutes: # time ls -l >/dev/null 0.773u 2.665s 0:18.21 18.8% 35+2749k 3012+0io 0pf+0w Immediately again: # time ls -l > /dev/null 0.665u 1.077s 0:08.60 20.1% 35+2719k 556+0io 0pf+0w 18.2 vs 8.6 seconds is an improvement, but even the 8.6 seconds is = longer than what I was expecting. >=20 > For a specific filesystem, nothing comes to mind, but I'm sure you = could cobble something together with zdb. There are several tools to = determine the amount of metadata in a ZFS storage pool: >=20 > - "zdb -bbb <pool>" > but this is unreliable on pools that are in use I tried this and it consumed >16GB of memory after about 5 minutes so I = had to kill it. I'll try it again during our next maintenance window = where it can be the only thing running. > - "zpool scrub <pool>; <wait for scrub to complete>; echo '::walk = spa|::zfs_blkstats' | mdb -k" > the scrub is slow, but this can be mitigated by setting the global = variable zfs_no_scrub_io to 1. If you don't have mdb or equivalent = debugging tools on freebsd, you can manually look at = <spa_t>->spa_dsl_pool->dp_blkstats. >=20 > In either case, the "LSIZE" is the size that's required for caching = (in memory or on a l2arc cache device). At a minimum you will need 512 = bytes for each file, to cache the dnode_phys_t. Okay, thanks a bunch. I'll try this on the next chance I get too. I think some of the issue is that nothing is being allowed to stay = cached long. We have several parallel rsyncs running at once that are = basically scanning every directory as fast as they can, combined with a = bunch of rsync, http and ftp clients. I'm guessing with all that = activity things are getting shoved out pretty quickly.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F4420A8C-FB92-4771-B261-6C47A736CF7F>