From owner-freebsd-stable@FreeBSD.ORG  Tue Mar  5 05:40:40 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id B7CEDABD
 for <stable@freebsd.org>; Tue,  5 Mar 2013 05:40:40 +0000 (UTC)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 5B269162
 for <stable@freebsd.org>; Tue,  5 Mar 2013 05:40:40 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id r255ecCC083743;
 Tue, 5 Mar 2013 00:40:38 -0500 (EST)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id r255ecEC083742;
 Tue, 5 Mar 2013 00:40:38 -0500 (EST) (envelope-from wollman)
Date: Tue, 5 Mar 2013 00:40:38 -0500 (EST)
From: Garrett Wollman <wollman@hergotha.csail.mit.edu>
Message-Id: <201303050540.r255ecEC083742@hergotha.csail.mit.edu>
To: killing@multiplay.co.uk
Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults?
In-Reply-To: <8C68812328E3483BA9786EF15591124D@multiplay.co.uk>
References: <513524B2.6020600@denninger.net>
 <1362449266.92708.8.camel@btw.pki2.com>
 <51355F64.4040409@denninger.net>
Organization: none
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7
 (hergotha.csail.mit.edu [127.0.0.1]); Tue, 05 Mar 2013 00:40:39 -0500 (EST)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 hergotha.csail.mit.edu
Cc: stable@freebsd.org
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Mar 2013 05:40:40 -0000

In article <8C68812328E3483BA9786EF15591124D@multiplay.co.uk>,
killing@multiplay.co.uk writes:

>Now interesting you should say that I've seen a stall recently on ZFS
>only box running on 6 x SSD RAIDZ2.
>
>The stall was caused by fairly large mysql import, with nothing else
>running.
>
>Then it happened I thought the machine had wedged, but minutes (not
>seconds) later, everything sprung into action again.

I have certainly seen what you might describe as "stalls", caused, so
far as I can tell, by kernel memory starvation.  I've seen it take as
much as a half an hour to recover from these (which is too long for my
users).  Right now I have the ARC limited to 64 GB (on a 96 GB file
server) and that has made it more stable, but it's still not behaving
quite as I would like, and I'm looking to put more memory into the
system (to be used for non-ARC functions).  Looking at my munin
graphs, I find that backups in particular put very heavy pressure on,
doubling the UMA allocations over steady-state, and this takes about
four or five hours to climb back down.  See
<http://people.freebsd.org/~wollman/vmstat_z-day.png> for an example.

Some of the stalls are undoubtedly caused by internal fragmentation
rather than actual data in use.  (Solaris used to have this issue, and
some hooks were added to allow some amount of garbage collection with
the cooperation of the filesystem.)

-GAWollman