From owner-freebsd-current@FreeBSD.ORG Mon Dec 19 14:22:11 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 118BD106564A for ; Mon, 19 Dec 2011 14:22:11 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm20-vm1.bullet.mail.ne1.yahoo.com (nm20-vm1.bullet.mail.ne1.yahoo.com [98.138.91.21]) by mx1.freebsd.org (Postfix) with SMTP id BE4288FC15 for ; Mon, 19 Dec 2011 14:22:10 +0000 (UTC) Received: from [98.138.90.51] by nm20.bullet.mail.ne1.yahoo.com with NNFMP; 19 Dec 2011 14:22:10 -0000 Received: from [98.138.226.127] by tm4.bullet.mail.ne1.yahoo.com with NNFMP; 19 Dec 2011 14:22:10 -0000 Received: from [127.0.0.1] by smtp206.mail.ne1.yahoo.com with NNFMP; 19 Dec 2011 14:22:10 -0000 X-Yahoo-Newman-Id: 63919.51508.bm@smtp206.mail.ne1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: 3amkN_wVM1m8NHj0sD00uS43_xbUiYzZgCl4qD5NG8uJTu7 gcJMaNrSO8UFGRQQuLLiH0Them0MNWJurHs5ceUC9FrLUw.Q9Orfk6baPmwU YZpmenlAdHwdHD5nVYzEY97r3KixSsQnjNwfVEJrMmPPBykWjIEeCbs71Bz3 AuwpvuDhLIW2eIv2cV6WtLkCAedUUeyrel7T2oTiY1OJGUy7.RMKzQlXemE1 fnVMaPg0I.HwwWpyMWsuOiF5V8jTm6bWf5Ksvs89kq6OSZYASr014Wkl.Blv skutvJnePVz9PzZRzg9x_L0g10oRAJyw_XkouDQkqK.gSuuKMd3u9fochexu QGASsOWCnkM8v5LBeJOG.HBXz.WbqhYSbHTgPiHVGJ5Hq2QHy22jAwS9Wukn kifPkeYt2eEsxd.Y- X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. Received: from [192.168.119.20] (se@81.173.154.224 with plain) by smtp206.mail.ne1.yahoo.com with SMTP; 19 Dec 2011 06:22:09 -0800 PST Message-ID: <4EEF488E.1030904@freebsd.org> Date: Mon, 19 Dec 2011 15:22:06 +0100 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: FreeBSD Current Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Uneven load on drives in ZFS RAIDZ1 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 14:22:11 -0000 Hi ZFS users, for quite some time I have observed an uneven distribution of load between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of a longer log of 10 second averages logged with gstat: dT: 10.001s w: 10.000s filter: ^a?da?.$ L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 130 106 4134 4.5 23 1033 5.2 48.8| ada0 0 131 111 3784 4.2 19 1007 4.0 47.6| ada1 0 90 66 2219 4.5 24 1031 5.1 31.7| ada2 1 81 58 2007 4.6 22 1023 2.3 28.1| ada3 L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 1 132 104 4036 4.2 27 1129 5.3 45.2| ada0 0 129 103 3679 4.5 26 1115 6.8 47.6| ada1 1 91 61 2133 4.6 30 1129 1.9 29.6| ada2 0 81 56 1985 4.8 24 1102 6.0 29.4| ada3 L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 1 148 108 4084 5.3 39 2511 7.2 55.5| ada0 1 141 104 3693 5.1 36 2505 10.4 54.4| ada1 1 102 62 2112 5.6 39 2508 5.5 35.4| ada2 0 99 60 2064 6.0 39 2483 3.7 36.1| ada3 This goes on for minutes, without a change of roles (I had assumed that other 10 minute samples might show relatively higher load on another subset of the drives, but it's always the first two, which receive some 50% more read requests than the other two. The test consisted of minidlna rebuilding its content database for a media collection held on that pool. The unbalanced distribution of requests does not depend on the particular application and the distribution of requests does not change when the drives with highest load approach 100% busy. This is a -CURRENT built from yesterdays sources, but the problem exists for quite some time (and should definitely be reproducible on -STABLE, too). The pool consists of a 4 drive raidz1 on an ICH10 (H67) without cache or log devices and without much ZFS tuning (only max. ARC size, should not at all be relevant in this context): zpool status -v pool: raid1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM raid1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 0 errors: No known data errors Cached configuration: version: 28 name: 'raid1' state: 0 txg: 153899 pool_guid: 10507751750437208608 hostid: 3558706393 hostname: 'se.local' vdev_children: 1 vdev_tree: type: 'root' id: 0 guid: 10507751750437208608 children[0]: type: 'raidz' id: 0 guid: 7821125965293497372 nparity: 1 metaslab_array: 30 metaslab_shift: 36 ashift: 12 asize: 7301425528832 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 7487684108701568404 path: '/dev/ada0p2' phys_path: '/dev/ada0p2' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 12000329414109214882 path: '/dev/ada1p2' phys_path: '/dev/ada1p2' whole_disk: 1 create_txg: 4 children[2]: type: 'disk' id: 2 guid: 2926246868795008014 path: '/dev/ada2p2' phys_path: '/dev/ada2p2' whole_disk: 1 create_txg: 4 children[3]: type: 'disk' id: 3 guid: 5226543136138409733 path: '/dev/ada3p2' phys_path: '/dev/ada3p2' whole_disk: 1 create_txg: 4 I'd be interested to know, whether this behavior can be reproduced on other systems with raidz1 pools consisting of 4 or more drives. All it takes is generating some disk load and running the command: gstat -I 10000000 -f '^a?da?.$' to obtain 10 second averages. I have not even tried to look at the scheduling of requests in ZFS, but I'm surprised to see higher than average load on just 2 of the 4 drives, since RAID parity should be evenly spread over all drives and for each file system block a different subset of 3 out of 4 drives should be able to deliver the data without need to reconstruct it from parity (that would lead to an even distribution of load). I've got two theories what might cause the obtained behavior: 1) There is some meta data that is only kept on the first two drives. Data is evenly spread, but meta data accesses lead to additional reads. 2) The read requests are distributed in such a way, that 1/3 goes to ada0, another 1/3 to ada1, while the remaining 1/3 is evenly distributed to ada2 and ada3. So: Can anybody reproduce this distribution requests? Any idea, why this is happening and whether something should be changed in ZFS to better distribute the load (leading to higher file system performance)? Best regards, STefan