From owner-freebsd-fs@FreeBSD.ORG Tue Dec 1 00:31:52 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 207B4106566B for ; Tue, 1 Dec 2009 00:31:52 +0000 (UTC) (envelope-from james-freebsd-fs2@jrv.org) Received: from mail.jrv.org (rrcs-24-73-246-106.sw.biz.rr.com [24.73.246.106]) by mx1.freebsd.org (Postfix) with ESMTP id D2B2F8FC08 for ; Tue, 1 Dec 2009 00:31:51 +0000 (UTC) Received: from kremvax.housenet.jrv (kremvax.housenet.jrv [192.168.3.124]) by mail.jrv.org (8.14.3/8.14.3) with ESMTP id nB10Vns1071345; Mon, 30 Nov 2009 18:31:50 -0600 (CST) (envelope-from james-freebsd-fs2@jrv.org) Authentication-Results: mail.jrv.org; domainkeys=pass (testing) header.from=james-freebsd-fs2@jrv.org DomainKey-Signature: a=rsa-sha1; s=enigma; d=jrv.org; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=Emr7usYPCV6u2RIM4lpLVfaMo8B5bWBgB7bcDIxjQs4LcBYeTEjYzpN6IYheBigwD ZMc1hwq/cgNkR9zsyuPcci0FJJoKwyzm8W2n7K3beDU0p3eXz8n9kG1oO7dbzYnNWr/ +mOHFcmL8KwEIQ4fK+5mt0JkzGYAF3XBOKy4TVo= Message-ID: <4B1463F5.5020403@jrv.org> Date: Mon, 30 Nov 2009 18:31:49 -0600 From: "James R. Van Artsdalen" User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: Andrew Snow References: <2ae8edf30911300120x627e42a9ha2cf003e847d4fbd@mail.gmail.com> <4B139AEB.8060900@jrv.org> <2ae8edf30911300425g4026909bm9262f6abcf82ddcd@mail.gmail.com> <5f67a8c40911301233s46a2818at9051c4ebbacf7e25@mail.gmail.com> <4B14495E.7050306@modulus.org> In-Reply-To: <4B14495E.7050306@modulus.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS guidelines - preparing for future storage expansion X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Dec 2009 00:31:52 -0000 Andrew Snow wrote: > Currently there is no "read-ahead" for scrubbing and resilvering, so > it only talks to one disk and at a time and proceeds using only about > half the I/O capacity of your disks (or less). Read-ahead is one of > the planned features for ZFS next year. gstat sometimes shows multiple outstanding I/O requests to a drive during a scrub. All of the disks are lit up at the same time: there's no one-disk-at-a-time. I see roughly 500 MB/sec during a scrub, which is around 50% of the theoretical bandwidth of both the disk-to-HBA links and the HBA-to-system slot in my case. I hope to be able to fix both this spring and see if I can reach gigabyte-per-second levels, especially for userland reads (I've seen 420 MB/s so far). (each vdev in my case is a 2-way mirror so 500 MB/s of disk is 250 MB/s of user data) > Also, when your disks are 98% or more full and you are doing any > writes at all ZFS spends a long time looking for free blocks with an > inefficient algorithm. An improved "disk full" algorithm is also > planned for next year. As the disk approaches 100% capacity the free space list(s) become shorter, not longer. It's fragmentation, or the need to search a long time for a large block in the right area, that is likely the problem. If you can accept the block at the head of the list there is no search at all. A quick snapshot during a scrub: dT: 1.006s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 238 238 30525 4.4 0 0 0.0 34.8| ada2 0 235 235 30016 3.1 0 0 0.0 26.3| ada3 0 274 274 35104 3.7 0 0 0.0 35.4| ada4 0 277 277 35485 4.0 0 0 0.0 40.5| ada5 0 273 273 34976 2.9 0 0 0.0 29.4| ada6 4 270 270 34474 7.2 0 0 0.0 53.4| ada7 0 271 271 34722 3.2 0 0 0.0 32.4| ada8 5 270 270 34410 3.4 0 0 0.0 34.0| ada9 7 268 268 34277 5.8 0 0 0.0 43.6| ada10 0 267 267 34213 4.3 0 0 0.0 32.1| ada11 4 269 269 34468 5.7 0 0 0.0 41.6| ada12 7 268 268 34277 4.7 0 0 0.0 33.4| ada13 0 277 277 35421 5.1 0 0 0.0 36.5| ada14 4 270 270 34595 5.4 0 0 0.0 37.5| ada15 0 269 269 34468 6.3 0 0 0.0 43.9| ada16 0 275 275 35167 6.2 0 0 0.0 44.9| ada17