From owner-freebsd-questions@FreeBSD.ORG  Wed Mar 24 17:55:49 2010
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8ACAD1065674
	for <freebsd-questions@freebsd.org>;
	Wed, 24 Mar 2010 17:55:49 +0000 (UTC)
	(envelope-from dan@dan.emsphone.com)
Received: from email1.allantgroup.com (email1.emsphone.com [199.67.51.115])
	by mx1.freebsd.org (Postfix) with ESMTP id 4CE228FC17
	for <freebsd-questions@freebsd.org>;
	Wed, 24 Mar 2010 17:55:48 +0000 (UTC)
Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101])
	by email1.allantgroup.com (8.14.0/8.14.0) with ESMTP id o2OHtlYn034757
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <freebsd-questions@freebsd.org>;
	Wed, 24 Mar 2010 12:55:47 -0500 (CDT)
	(envelope-from dan@dan.emsphone.com)
Received: from dan.emsphone.com (smmsp@localhost [127.0.0.1])
	by dan.emsphone.com (8.14.4/8.14.3) with ESMTP id o2OHtkxO031149
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <freebsd-questions@freebsd.org>;
	Wed, 24 Mar 2010 12:55:47 -0500 (CDT)
	(envelope-from dan@dan.emsphone.com)
Received: (from dan@localhost)
	by dan.emsphone.com (8.14.4/8.14.3/Submit) id o2OHtkJ1031147;
	Wed, 24 Mar 2010 12:55:46 -0500 (CDT) (envelope-from dan)
Date: Wed, 24 Mar 2010 12:55:46 -0500
From: Dan Nelson <dnelson@allantgroup.com>
To: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
Message-ID: <20100324175546.GF12330@dan.emsphone.com>
References: <cf9b1ee01003240918g3a7d61adqaeccb609090a09ea@mail.gmail.com>
	<alpine.GSO.2.01.1003241212010.29281@freddy.simplesystems.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.GSO.2.01.1003241212010.29281@freddy.simplesystems.org>
X-OS: FreeBSD 8.0-STABLE
User-Agent: Mutt/1.5.20 (2009-06-14)
X-Virus-Scanned: clamav-milter 0.95.3 at email1.allantgroup.com
X-Virus-Status: Clean
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2
	(email1.allantgroup.com [199.67.51.78]);
	Wed, 24 Mar 2010 12:55:47 -0500 (CDT)
X-Scanned-By: MIMEDefang 2.45
Cc: freebsd-fs@freebsd.org, Dan Naumov <dan.naumov@gmail.com>,
	freebsd-questions@freebsd.org
Subject: Re: tuning vfs.zfs.vdev.max_pending and solving the issue of ZFS
 writes choking read IO
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Mar 2010 17:55:49 -0000

In the last episode (Mar 24), Bob Friesenhahn said:
> On Wed, 24 Mar 2010, Dan Naumov wrote:
> > Has anyone done any extensive testing of the effects of tuning
> > vfs.zfs.vdev.max_pending on this issue?  Is there some universally
> > recommended value beyond the default 35?  Anything else I should be
> > looking at?
> 
> The vdev.max_pending value is primarily used to tune for SAN/HW-RAID LUNs
> and is used to dial down LUN service time (svc_t) values by limiting the
> number of pending requests.  It is not terribly useful for decreasing
> stalls due to zfs writes.  In order to reduce the impact of zfs writes,
> you want to limit the maximum size of a zfs transaction group (TXG).  I
> don't know what the FreeBSD tunable is for this, but under Solaris it is
> zfs:zfs_write_limit_override.

There isn't a sysctl for it by default, but the following patch will enable
a vfs.zfs.write_limit_override sysctl:

Index: dsl_pool.c
===================================================================
RCS file: /home/ncvs/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c,v
retrieving revision 1.4.2.1
diff -u -p -r1.4.2.1 dsl_pool.c
--- dsl_pool.c	17 Aug 2009 09:55:58 -0000	1.4.2.1
+++ dsl_pool.c	11 Mar 2010 08:34:27 -0000
@@ -47,6 +47,11 @@ uint64_t zfs_write_limit_inflated = 0;
 uint64_t zfs_write_limit_override = 0;
 extern uint64_t zfs_write_limit_min;
 
+SYSCTL_DECL(_vfs_zfs);
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RW,
+	&zfs_write_limit_override, 0,
+	"Force a txg if dirty buffers exceed this value (bytes)");
+
 kmutex_t zfs_write_limit_lock;
 
 static pgcnt_t old_physmem = 0;

 
> On a large-memory system, a properly working zfs should not saturate 
> the write channel for more than 5 seconds.  Zfs tries to learn the 
> write bandwidth so that it can tune the TXG size up to 5 seconds (max) 
> worth of writes.  If you have both large memory and fast storage, 
> quite a huge amount of data can be written in 5 seconds.  On my 
> Solaris system, I found that zfs was quite accurate with its rate 
> estimation, but it resulted in four gigabytes of data being written 
> per TXG.

I had similar problems on a 32GB Solaris server at work.  Note that with
compression enabled, the entire system pauses while it compresses the
outgoing block of data.  It's just a fraction of a second, but long enough
for end-users to complain about bad performance in X sessions.  I had to
throttle back to a 256MB write limit size to make the stuttering go away
completely.  It didn't affect write throughput much at all.

-- 
	Dan Nelson
	dnelson@allantgroup.com