Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Mar 2013 10:17:50 -0800
From:      Freddie Cash <fjwcash@gmail.com>
To:        Gary Palmer <gpalmer@freebsd.org>
Cc:        Steven Hartland <killing@multiplay.co.uk>, stable@freebsd.org, Garrett Wollman <wollman@hergotha.csail.mit.edu>
Subject:   Re: ZFS "stalls" -- and maybe we should be talking about defaults?
Message-ID:  <CAOjFWZ7TDYgS=vvcuLRPr%2BfAv4bPk4ZU0CKbgf1BydxG1MeOYw@mail.gmail.com>
In-Reply-To: <20130305152252.GA52706@in-addr.com>
References:  <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> <201303050540.r255ecEC083742@hergotha.csail.mit.edu> <20130305152252.GA52706@in-addr.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 5, 2013 at 7:22 AM, Gary Palmer <gpalmer@freebsd.org> wrote:

> Just as a note that there was a page I read in the past few months
> that pointed out that having a huge ARC may not always be in the best
> interests of the system.  Some operation on the filesystem (I forget
> what, apologies) caused the system to churn through the ARC and discard
> most of it, while regular I/O was blocked
>

Huh.  What timing.  I've been fighting with our largest ZFS box (128 GB of
RAM, 16 CPU cores, 2x SSD for SLOG, 2x SSD for L2ARC, 45x 2 TB HD for pool
in 6-driive raidz2 vdevs) for the past week trying to figure out why ZFS
send/recv just hangs after awhile.  Everything is stuck in "D" in "ps ax"
output, and top show the l2arc_feed_ thread using 100% of one CPU.  Even
removing the L2ARC devices from the pool doesn't help, just slows the
amount of time until the "hang".

ARC was configured for 120 GB, with arc_meta_limit set to 90 GB.  Yes,
dedup and compression are enabled (it's a backups storage box, and we get
over 5x combined dedup/compress ratio).  After several hours of running,
the ARC and wired would get up to 100+ GB, and the box would spend most of
its time "spinning", with almost 0 I/O to the pool (only a few KB/s of
reads in "zpool iostat 1" or "gstat").

ZFS send/recv would eventually complete, but what used to take 15-20
minutes would take 6-8 hours to complete.

I've reduced the ARC to only 32 GB, with arc_meta set to 28 GB, and things
are running much smoother now (50-200 MB/s writes for 3-5 seconds every
10s), and send/recv is back down to 10-15 minutes.

Who would have thought "too much RAM" would be an issue?

Will play with this over the next couple of days with different ARC max
settings to see where the problems start.  All of our ZFS boxes until this
one had under 64 GB of RAM.  (And we had issues with dedupe enabled on
boxes with too little RAM, as in under 32 GB.)

-- 
Freddie Cash
fjwcash@gmail.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOjFWZ7TDYgS=vvcuLRPr%2BfAv4bPk4ZU0CKbgf1BydxG1MeOYw>