From owner-freebsd-current@FreeBSD.ORG Sun Aug 22 15:15:04 2010 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 260211065674 for ; Sun, 22 Aug 2010 15:15:04 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (core.vx.sk [IPv6:2a01:4f8:100:1043::2]) by mx1.freebsd.org (Postfix) with ESMTP id B30B58FC16 for ; Sun, 22 Aug 2010 15:15:03 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.1]) by mail.vx.sk (Postfix) with ESMTP id 9B7469EC1D for ; Sun, 22 Aug 2010 17:15:02 +0200 (CEST) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk ([127.0.0.1]) by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024) with LMTP id ZWAwoByzKFSz for ; Sun, 22 Aug 2010 17:15:00 +0200 (CEST) Received: from [10.9.8.1] (188-167-78-139.dynamic.chello.sk [188.167.78.139]) by mail.vx.sk (Postfix) with ESMTPSA id 9AA469EC08 for ; Sun, 22 Aug 2010 17:15:00 +0200 (CEST) Message-ID: <4C713EF5.8080402@FreeBSD.org> Date: Sun, 22 Aug 2010 17:15:01 +0200 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; sk; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 Mnenhy/0.7.5.0 MIME-Version: 1.0 To: freebsd-current@FreeBSD.org X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=windows-1250 Content-Transfer-Encoding: 7bit Cc: Subject: [CFT] Improved ZFS metaslab code (faster write speed) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Aug 2010 15:15:04 -0000 Dear FreeBSD community, many of our [2] (and Solaris [3]) users today are complaining about slow ZFS writes. One of the causes for these writes is the selection of the proper allocation method for allocation of new blocks [3] [4]. Another issue a write slowdown during TXG sync times. Solaris 10 (and OpenSolaris up to november 2009) have the following scenario: - pool has more than 30% free space: use first fit method [1] - pool has less than 30% free space: use best fit method [1] This causes a major slowdown of the writes if we go below 30% of free space. On large pools, 30% may be terabytes of free space. OpenSolaris has changed this in November 2009 and the Oracle Storage Appliances also included the new code in Q1/2010 [1]. The source [1] states, that with this change they archieved a speedup of: "50% Improved OLTP Performance, 70% Reduced Variability, 200% Improvement on MS Exchange" I would like to issue a Call For Testing for the following 9-CURRENT patch: http://people.freebsd.org/~mm/patches/zfs/zfs_metaslab.patch To apply the patch against 8-STABLE, you need to apply the v15 update first: http://people.freebsd.org/~mm/patches/zfs/v15/stable-8-v15.patch The patch includes the following OpenSolaris onnv revisions: 10921 (partial), 11146, 11728, 12047 And covers the following Bug IDs: 6826241 Sync write IOPS drops dramatically during TXG sync 6869229 zfs should switch to shiny new metaslabs more frequently 6917066 zfs block picking can be improved 6918420 zdb -m has issues printing metaslab statistics References: [1] http://blogs.sun.com/roch/entry/doubling_exchange_performance [2] http://forums.freebsd.org/showthread.php?t=8270 [3] http://blogs.everycity.co.uk/alasdair/2010/07/zfs-runs-really-slowly-when-free-disk-usage-goes-above-80/ [4] http://blogs.sun.com/bonwick/entry/zfs_block_allocation [5] http://blogs.sun.com/bonwick/entry/space_maps