From owner-freebsd-fs@FreeBSD.ORG Fri Feb 24 13:42:25 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8739F1065670 for ; Fri, 24 Feb 2012 13:42:25 +0000 (UTC) (envelope-from luke-lists@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns225413.hybrid-sites.com [176.31.225.127]) by mx1.freebsd.org (Postfix) with ESMTP id 2F6D38FC18 for ; Fri, 24 Feb 2012 13:42:24 +0000 (UTC) Received: from [127.0.0.1] (helo=youse) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S0vPc-0009FP-56; Fri, 24 Feb 2012 13:42:22 +0000 From: Luke Marsden To: Tom Evans In-Reply-To: References: <1330081612.13430.39.camel@pow> <1330087470.13430.61.camel@pow> Content-Type: text/plain; charset="UTF-8" Date: Fri, 24 Feb 2012 13:42:14 +0000 Message-ID: <1330090934.13430.90.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: / Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk Subject: Re: Another ZFS ARC memory question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 13:42:25 -0000 On Fri, 2012-02-24 at 12:59 +0000, Tom Evans wrote: > On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden > wrote: > > On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote: > >> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden > >> wrote: > >> > Hi all, > >> > > >> > Just wanted to get your opinion on best practices for ZFS. > >> > > >> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines > >> > but have been having trouble with short spikes in application memory > >> > usage resulting in huge amounts of swapping, bringing the whole machine > >> > to its knees and crashing it hard. I suspect this is because when there > >> > is a sudden spike in memory usage the zfs arc reclaim thread is unable > >> > to free system memory fast enough. > >> > > >> > This most recently happened yesterday as you can see from the following > >> > munin graphs: > >> > > >> > E.g. http://hybrid-logic.co.uk/memory-day.png > >> > http://hybrid-logic.co.uk/swap-day.png > >> > > >> > Our response has been to start limiting the ZFS ARC cache to 4GB on our > >> > production machines - trading performance for stability is fine with me > >> > (and we have L2ARC on SSD so we still get good levels of caching). > >> > > >> > My questions are: > >> > > >> > * is this a known problem? > >> > * what is the community's advice for production machines running > >> > ZFS on FreeBSD, is manually limiting the ARC cache (to ensure > >> > that there's enough actually free memory to handle a spike in > >> > application memory usage) the best solution to this > >> > spike-in-memory-means-crash problem? > >> > * has FreeBSD 9.0 / ZFS v28 solved this problem? > >> > * rather than setting a hard limit on the ARC cache size, is it > >> > possible to adjust the auto-tuning variables to leave more free > >> > memory for spiky memory situations? e.g. set the auto-tuning to > >> > make arc eat 80% of memory instead of ~95% like it is at > >> > present? > >> > * could the arc reclaim thread be made to drop ARC pages with > >> > higher priority before the system starts swapping out > >> > application pages? > >> > > >> > Thank you for any/all answers, and thank you for making FreeBSD > >> > awesome :-) > >> > >> It's not a problem, it's a feature! > >> > >> By default the ARC will attempt to cache as much as it can - it > >> assumes the box is a ZFS filer, and doesn't need RAM for applications. > >> The solution, as you've found out, is to limit how much ARC can take > >> up. > >> > >> In practice, you should be doing this anyway. You should know, or have > >> an idea, of how much RAM is required for the applications on that box, > >> and you need to limit ZFS to not eat into that required RAM. > > > > Thanks for your reply, Tom! I agree that the ARC cache is a great > > feature, but for a general purpose filesystem it does seem like a > > reasonable expectation that filesystem cache will be evicted before > > application data is swapped, even if the spike in memory usage is rather > > aggressive. A complete server crash in this scenario is rather > > unfortunate. > > > > My question stands - is this an area which has been improved on in the > > ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be > > standard practice to guess how much memory the applications running on > > the server might need and set the arc_max boot.loader tweak > > appropriately? This is reasonably tricky when providing general purpose > > web application hosting and so we'll often end up erring on the side of > > caution and leaving lots of RAM free "just in case". > > > > If the latter is indeed the case in the latest stable releases then I > > would like to update http://wiki.freebsd.org/ZFSTuningGuide which > > currently states: > > > > FreeBSD 7.2+ has improved kernel memory allocation strategy and > > no tuning may be necessary on systems with more than 2 GB of > > RAM. > > > > Thank you! > > > > Best Regards, > > Luke Marsden > > > > Hmm. That comment is really talking about that you no longer need to > tune vm.kmem_size. http://wiki.freebsd.org/ZFSTuningGuide "No tuning may be necessary" seems to indicate that no changes need to be made to boot.loader. I'm happy to provide a patch for the wiki which makes it clearer that for servers which may experience sudden spikes in application memory usage (i.e. all servers running user-supplied applications), the speed of ARC eviction is insufficient to ensure stability and arc_max should be tuned downwards. > I get what you are saying about applications suddenly using a lot of > RAM should not cause the server to fall over. Do you know why it fell > over? IE, was it a panic, a deadlock, etc. If you look at the http://hybrid-logic.co.uk/swap-day.png graph you can see a huge spike in swap at the point at which the last line of pixels at http://hybrid-logic.co.uk/memory-day.png indicates the sudden increase in memory usage (by 3GB in active memory usage if you look closely). Since the graph stops at that point it indicates that the server became completely unresponsive (e.g. including munin probe requests). I did manage to log in just before it became completely unresponsive, but at that point the incoming requests weren't being serviced fast enough due to the excessive swapping and the server eventually became completely unresponsive (e.g. 'top' output froze and never came back). It continued to respond to pings though and may have eventually recovered if I had disabled inbound network traffic. I don't have any evidence of a panic or deadlock, we just hard rebooted the machine about 15 minutes later after it failed to recover from the swap-storm. > FreeBSD does not cope well when you have used up all RAM and swap > (well, what does?), and from your graphs it does look like the ARC is > not super massive when you had the problem - around 30-40% of RAM? The last munin sample indicates roughly 8.5GB ARC out of 24GB, so yes, 35%. I guess what I'd like is for FreeBSD to detect an emergency out-of-memory condition and aggressively drop much or all of the ARC cache *before* swapping out application memory which causes the system to grind to a halt. Is this a reasonable request, and is there anything I can do to help implement it? If not can we update the wiki to make it clearer that ARC limiting is necessary, even with high RAM boxes, to ensure stability under spiky memory conditions? Thanks! Best Regards, Luke Marsden -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com