From owner-freebsd-questions@FreeBSD.ORG Wed Mar 24 17:55:49 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8ACAD1065674 for ; Wed, 24 Mar 2010 17:55:49 +0000 (UTC) (envelope-from dan@dan.emsphone.com) Received: from email1.allantgroup.com (email1.emsphone.com [199.67.51.115]) by mx1.freebsd.org (Postfix) with ESMTP id 4CE228FC17 for ; Wed, 24 Mar 2010 17:55:48 +0000 (UTC) Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101]) by email1.allantgroup.com (8.14.0/8.14.0) with ESMTP id o2OHtlYn034757 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 24 Mar 2010 12:55:47 -0500 (CDT) (envelope-from dan@dan.emsphone.com) Received: from dan.emsphone.com (smmsp@localhost [127.0.0.1]) by dan.emsphone.com (8.14.4/8.14.3) with ESMTP id o2OHtkxO031149 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 24 Mar 2010 12:55:47 -0500 (CDT) (envelope-from dan@dan.emsphone.com) Received: (from dan@localhost) by dan.emsphone.com (8.14.4/8.14.3/Submit) id o2OHtkJ1031147; Wed, 24 Mar 2010 12:55:46 -0500 (CDT) (envelope-from dan) Date: Wed, 24 Mar 2010 12:55:46 -0500 From: Dan Nelson To: Bob Friesenhahn Message-ID: <20100324175546.GF12330@dan.emsphone.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-OS: FreeBSD 8.0-STABLE User-Agent: Mutt/1.5.20 (2009-06-14) X-Virus-Scanned: clamav-milter 0.95.3 at email1.allantgroup.com X-Virus-Status: Clean X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (email1.allantgroup.com [199.67.51.78]); Wed, 24 Mar 2010 12:55:47 -0500 (CDT) X-Scanned-By: MIMEDefang 2.45 Cc: freebsd-fs@freebsd.org, Dan Naumov , freebsd-questions@freebsd.org Subject: Re: tuning vfs.zfs.vdev.max_pending and solving the issue of ZFS writes choking read IO X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 17:55:49 -0000 In the last episode (Mar 24), Bob Friesenhahn said: > On Wed, 24 Mar 2010, Dan Naumov wrote: > > Has anyone done any extensive testing of the effects of tuning > > vfs.zfs.vdev.max_pending on this issue? Is there some universally > > recommended value beyond the default 35? Anything else I should be > > looking at? > > The vdev.max_pending value is primarily used to tune for SAN/HW-RAID LUNs > and is used to dial down LUN service time (svc_t) values by limiting the > number of pending requests. It is not terribly useful for decreasing > stalls due to zfs writes. In order to reduce the impact of zfs writes, > you want to limit the maximum size of a zfs transaction group (TXG). I > don't know what the FreeBSD tunable is for this, but under Solaris it is > zfs:zfs_write_limit_override. There isn't a sysctl for it by default, but the following patch will enable a vfs.zfs.write_limit_override sysctl: Index: dsl_pool.c =================================================================== RCS file: /home/ncvs/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c,v retrieving revision 1.4.2.1 diff -u -p -r1.4.2.1 dsl_pool.c --- dsl_pool.c 17 Aug 2009 09:55:58 -0000 1.4.2.1 +++ dsl_pool.c 11 Mar 2010 08:34:27 -0000 @@ -47,6 +47,11 @@ uint64_t zfs_write_limit_inflated = 0; uint64_t zfs_write_limit_override = 0; extern uint64_t zfs_write_limit_min; +SYSCTL_DECL(_vfs_zfs); +SYSCTL_QUAD(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RW, + &zfs_write_limit_override, 0, + "Force a txg if dirty buffers exceed this value (bytes)"); + kmutex_t zfs_write_limit_lock; static pgcnt_t old_physmem = 0; > On a large-memory system, a properly working zfs should not saturate > the write channel for more than 5 seconds. Zfs tries to learn the > write bandwidth so that it can tune the TXG size up to 5 seconds (max) > worth of writes. If you have both large memory and fast storage, > quite a huge amount of data can be written in 5 seconds. On my > Solaris system, I found that zfs was quite accurate with its rate > estimation, but it resulted in four gigabytes of data being written > per TXG. I had similar problems on a 32GB Solaris server at work. Note that with compression enabled, the entire system pauses while it compresses the outgoing block of data. It's just a fraction of a second, but long enough for end-users to complain about bad performance in X sessions. I had to throttle back to a 256MB write limit size to make the stuttering go away completely. It didn't affect write throughput much at all. -- Dan Nelson dnelson@allantgroup.com