Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Feb 2012 13:42:14 +0000
From:      Luke Marsden <luke-lists@hybrid-logic.co.uk>
To:        Tom Evans <tevans.uk@googlemail.com>
Cc:        freebsd-fs@freebsd.org, team@hybrid-logic.co.uk
Subject:   Re: Another ZFS ARC memory question
Message-ID:  <1330090934.13430.90.camel@pow>
In-Reply-To: <CAFHbX1JA9HdF59_NAXzy3R%2BZGN9CFrTWcbYq4ajBjvD_WTBTwA@mail.gmail.com>
References:  <1330081612.13430.39.camel@pow> <CAFHbX1KPW%2B4h2-LHE9rB0aVRqw%2BAzVDrjjVB2CCt=7T4JB8C3A@mail.gmail.com> <1330087470.13430.61.camel@pow> <CAFHbX1JA9HdF59_NAXzy3R%2BZGN9CFrTWcbYq4ajBjvD_WTBTwA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2012-02-24 at 12:59 +0000, Tom Evans wrote:
> On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden
> <luke-lists@hybrid-logic.co.uk> wrote:
> > On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote:
> >> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
> >> <luke-lists@hybrid-logic.co.uk> wrote:
> >> > Hi all,
> >> >
> >> > Just wanted to get your opinion on best practices for ZFS.
> >> >
> >> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
> >> > but have been having trouble with short spikes in application memory
> >> > usage resulting in huge amounts of swapping, bringing the whole machine
> >> > to its knees and crashing it hard.  I suspect this is because when there
> >> > is a sudden spike in memory usage the zfs arc reclaim thread is unable
> >> > to free system memory fast enough.
> >> >
> >> > This most recently happened yesterday as you can see from the following
> >> > munin graphs:
> >> >
> >> > E.g. http://hybrid-logic.co.uk/memory-day.png
> >> >     http://hybrid-logic.co.uk/swap-day.png
> >> >
> >> > Our response has been to start limiting the ZFS ARC cache to 4GB on our
> >> > production machines - trading performance for stability is fine with me
> >> > (and we have L2ARC on SSD so we still get good levels of caching).
> >> >
> >> > My questions are:
> >> >
> >> >      * is this a known problem?
> >> >      * what is the community's advice for production machines running
> >> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
> >> >        that there's enough actually free memory to handle a spike in
> >> >        application memory usage) the best solution to this
> >> >        spike-in-memory-means-crash problem?
> >> >      * has FreeBSD 9.0 / ZFS v28 solved this problem?
> >> >      * rather than setting a hard limit on the ARC cache size, is it
> >> >        possible to adjust the auto-tuning variables to leave more free
> >> >        memory for spiky memory situations?  e.g. set the auto-tuning to
> >> >        make arc eat 80% of memory instead of ~95% like it is at
> >> >        present?
> >> >      * could the arc reclaim thread be made to drop ARC pages with
> >> >        higher priority before the system starts swapping out
> >> >        application pages?
> >> >
> >> > Thank you for any/all answers, and thank you for making FreeBSD
> >> > awesome :-)
> >>
> >> It's not a problem, it's a feature!
> >>
> >> By default the ARC will attempt to cache as much as it can - it
> >> assumes the box is a ZFS filer, and doesn't need RAM for applications.
> >> The solution, as you've found out, is to limit how much ARC can take
> >> up.
> >>
> >> In practice, you should be doing this anyway. You should know, or have
> >> an idea, of how much RAM is required for the applications on that box,
> >> and you need to limit ZFS to not eat into that required RAM.
> >
> > Thanks for your reply, Tom!  I agree that the ARC cache is a great
> > feature, but for a general purpose filesystem it does seem like a
> > reasonable expectation that filesystem cache will be evicted before
> > application data is swapped, even if the spike in memory usage is rather
> > aggressive.  A complete server crash in this scenario is rather
> > unfortunate.
> >
> > My question stands - is this an area which has been improved on in the
> > ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be
> > standard practice to guess how much memory the applications running on
> > the server might need and set the arc_max boot.loader tweak
> > appropriately?  This is reasonably tricky when providing general purpose
> > web application hosting and so we'll often end up erring on the side of
> > caution and leaving lots of RAM free "just in case".
> >
> > If the latter is indeed the case in the latest stable releases then I
> > would like to update http://wiki.freebsd.org/ZFSTuningGuide which
> > currently states:
> >
> >        FreeBSD 7.2+ has improved kernel memory allocation strategy and
> >        no tuning may be necessary on systems with more than 2 GB of
> >        RAM.
> >
> > Thank you!
> >
> > Best Regards,
> > Luke Marsden
> >
> 
> Hmm. That comment is really talking about that you no longer need to
> tune vm.kmem_size.

http://wiki.freebsd.org/ZFSTuningGuide

"No tuning may be necessary" seems to indicate that no changes need to
be made to boot.loader.  I'm happy to provide a patch for the wiki which
makes it clearer that for servers which may experience sudden spikes in
application memory usage (i.e. all servers running user-supplied
applications), the speed of ARC eviction is insufficient to ensure
stability and arc_max should be tuned downwards.

> I get what you are saying about applications suddenly using a lot of
> RAM should not cause the server to fall over. Do you know why it fell
> over? IE, was it a panic, a deadlock, etc.

If you look at the http://hybrid-logic.co.uk/swap-day.png graph you can
see a huge spike in swap at the point at which the last line of pixels
at http://hybrid-logic.co.uk/memory-day.png indicates the sudden
increase in memory usage (by 3GB in active memory usage if you look
closely).  Since the graph stops at that point it indicates that the
server became completely unresponsive (e.g. including munin probe
requests).  I did manage to log in just before it became completely
unresponsive, but at that point the incoming requests weren't being
serviced fast enough due to the excessive swapping and the server
eventually became completely unresponsive (e.g. 'top' output froze and
never came back).  It continued to respond to pings though and may have
eventually recovered if I had disabled inbound network traffic.  I don't
have any evidence of a panic or deadlock, we just hard rebooted the
machine about 15 minutes later after it failed to recover from the
swap-storm.

> FreeBSD does not cope well when you have used up all RAM and swap
> (well, what does?), and from your graphs it does look like the ARC is
> not super massive when you had the problem - around 30-40% of RAM?

The last munin sample indicates roughly 8.5GB ARC out of 24GB, so yes,
35%.  I guess what I'd like is for FreeBSD to detect an emergency
out-of-memory condition and aggressively drop much or all of the ARC
cache *before* swapping out application memory which causes the system
to grind to a halt.

Is this a reasonable request, and is there anything I can do to help
implement it?

If not can we update the wiki to make it clearer that ARC limiting is
necessary, even with high RAM boxes, to ensure stability under spiky
memory conditions?

Thanks!

Best Regards,
Luke Marsden

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1330090934.13430.90.camel>