From owner-freebsd-stable@FreeBSD.ORG Fri Feb 24 11:42:11 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76853106566B; Fri, 24 Feb 2012 11:42:11 +0000 (UTC) (envelope-from luke-lists@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns225413.hybrid-sites.com [176.31.225.127]) by mx1.freebsd.org (Postfix) with ESMTP id 3DEF98FC0A; Fri, 24 Feb 2012 11:42:10 +0000 (UTC) Received: from [127.0.0.1] (helo=youse) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S0szG-000EeU-ET; Fri, 24 Feb 2012 11:07:00 +0000 From: Luke Marsden To: "freebsd-stable@freebsd.org" Content-Type: text/plain; charset="UTF-8" Date: Fri, 24 Feb 2012 11:06:52 +0000 Message-ID: <1330081612.13430.39.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: / Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk Subject: Another ZFS ARC memory question X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 11:42:11 -0000 Hi all, Just wanted to get your opinion on best practices for ZFS. We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines but have been having trouble with short spikes in application memory usage resulting in huge amounts of swapping, bringing the whole machine to its knees and crashing it hard. I suspect this is because when there is a sudden spike in memory usage the zfs arc reclaim thread is unable to free system memory fast enough. This most recently happened yesterday as you can see from the following munin graphs: E.g. http://hybrid-logic.co.uk/memory-day.png http://hybrid-logic.co.uk/swap-day.png Our response has been to start limiting the ZFS ARC cache to 4GB on our production machines - trading performance for stability is fine with me (and we have L2ARC on SSD so we still get good levels of caching). My questions are: * is this a known problem? * what is the community's advice for production machines running ZFS on FreeBSD, is manually limiting the ARC cache (to ensure that there's enough actually free memory to handle a spike in application memory usage) the best solution to this spike-in-memory-means-crash problem? * has FreeBSD 9.0 / ZFS v28 solved this problem? * rather than setting a hard limit on the ARC cache size, is it possible to adjust the auto-tuning variables to leave more free memory for spiky memory situations? e.g. set the auto-tuning to make arc eat 80% of memory instead of ~95% like it is at present? * could the arc reclaim thread be made to drop ARC pages with higher priority before the system starts swapping out application pages? Thank you for any/all answers, and thank you for making FreeBSD awesome :-) Best Regards, Luke Marsden -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com