From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 05:40:40 2013 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B7CEDABD for ; Tue, 5 Mar 2013 05:40:40 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 5B269162 for ; Tue, 5 Mar 2013 05:40:40 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id r255ecCC083743; Tue, 5 Mar 2013 00:40:38 -0500 (EST) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id r255ecEC083742; Tue, 5 Mar 2013 00:40:38 -0500 (EST) (envelope-from wollman) Date: Tue, 5 Mar 2013 00:40:38 -0500 (EST) From: Garrett Wollman Message-Id: <201303050540.r255ecEC083742@hergotha.csail.mit.edu> To: killing@multiplay.co.uk Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? In-Reply-To: <8C68812328E3483BA9786EF15591124D@multiplay.co.uk> References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> Organization: none X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (hergotha.csail.mit.edu [127.0.0.1]); Tue, 05 Mar 2013 00:40:39 -0500 (EST) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hergotha.csail.mit.edu Cc: stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 05:40:40 -0000 In article <8C68812328E3483BA9786EF15591124D@multiplay.co.uk>, killing@multiplay.co.uk writes: >Now interesting you should say that I've seen a stall recently on ZFS >only box running on 6 x SSD RAIDZ2. > >The stall was caused by fairly large mysql import, with nothing else >running. > >Then it happened I thought the machine had wedged, but minutes (not >seconds) later, everything sprung into action again. I have certainly seen what you might describe as "stalls", caused, so far as I can tell, by kernel memory starvation. I've seen it take as much as a half an hour to recover from these (which is too long for my users). Right now I have the ARC limited to 64 GB (on a 96 GB file server) and that has made it more stable, but it's still not behaving quite as I would like, and I'm looking to put more memory into the system (to be used for non-ARC functions). Looking at my munin graphs, I find that backups in particular put very heavy pressure on, doubling the UMA allocations over steady-state, and this takes about four or five hours to climb back down. See for an example. Some of the stalls are undoubtedly caused by internal fragmentation rather than actual data in use. (Solaris used to have this issue, and some hooks were added to allow some amount of garbage collection with the cooperation of the filesystem.) -GAWollman