From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 16:04:53 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9BA7B2E9 for ; Mon, 22 Jun 2015 16:04:53 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com [74.125.82.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 38A50D90 for ; Mon, 22 Jun 2015 16:04:52 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by wguu7 with SMTP id u7so73463155wgu.3 for ; Mon, 22 Jun 2015 09:04:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=nKSgCEWwSNgLooc2Y/DvM1oukiioE6h8hy1ROiLHXJk=; b=hr31UO5j5gWk6gqVDjB631G/j+Whhlz1jO5sXUCJoVWklPhCwaxFQ4tQDqG4Xv1amV QHr8iN3pMlskaiLdq+iqpEEh7kDVt++s2ESa/baw/Y8Jebo96TR4EROpkp6ZRwRN/erN m300+vF8DBppFBOOyuv4hsZ7rkl3YLtzVDbhimKX78eK7qKHpmM5v2ydbZeXx9FmRpd+ DKDW0A1PNHwt0lB00PLDrT4d4hcNmwLuPXZ3FScq4mATD+IlVkIYA23l1vtCB/2Q31Te IT4TJMSuMbxAmCDD2et3VRy4XfZVszSGpiKcPYXt4bjYK0db2FfAoqLKX8BWjJUJZmGy /alg== X-Gm-Message-State: ALoCoQmsC8i+sFc0uSIzO4MuPxOXg0Tvl8c/I2qrK1yFRKtuAVEbB+DfAHeuj/sPIuqNJujgz27T X-Received: by 10.194.109.36 with SMTP id hp4mr51614154wjb.4.1434989085051; Mon, 22 Jun 2015 09:04:45 -0700 (PDT) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by mx.google.com with ESMTPSA id fo13sm17870049wic.0.2015.06.22.09.04.43 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jun 2015 09:04:43 -0700 (PDT) Subject: Re: ZFS raid write performance? To: kpneal@pobox.com References: <5587C3FF.9070407@sneakertech.com> <20150622121343.GB60684@neutralgood.org> <55880544.70907@multiplay.co.uk> <20150622153056.GA96798@neutralgood.org> Cc: freebsd-fs@freebsd.org From: Steven Hartland Message-ID: <5588321A.4060102@multiplay.co.uk> Date: Mon, 22 Jun 2015 17:04:42 +0100 User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 In-Reply-To: <20150622153056.GA96798@neutralgood.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 16:04:53 -0000 On 22/06/2015 16:30, kpneal@pobox.com wrote: > On Mon, Jun 22, 2015 at 01:53:24PM +0100, Steven Hartland wrote: >> On 22/06/2015 13:13, kpneal@pobox.com wrote: >>> On Mon, Jun 22, 2015 at 04:14:55AM -0400, Quartz wrote: >>>> What's sequential write performance like these days for ZFS raidzX? >>>> Someone suggested to me that I set up a single not-raid disk to act as a >>>> fast 'landing pad' for receiving files, then move them to the pool later >>>> in the background. Is that actually necessary? (Assume generic sata >>>> drives, 250mb-4gb sized files, and transfers are across a LAN using >>>> single unbonded GigE). >>> Tests were posted to ZFS lists a few years ago. That was a while ago, but >>> at a fundamental level ZFS hasn't changed since then so the results should >>> still be valid. >>> >>> For both reads and writes all levels of raidz* perform slightly faster >>> than the speed of a single drive. _Slightly_ faster, like, the speed of >>> a single drive * 1.1 or so roughly speaking. >>> >>> For mirrors, writes perform about the same as a single drive, and as more >>> drives are added they get slightly worse. But reads scale pretty well as >>> you add drives because reads can be spread across all the drives in the >>> mirror in parallel. >>> >>> Having multiple vdevs helps because ZFS does striping across the vdevs. >>> However, this striping only happens with writes that are done _after_ new >>> vdevs are added. There is no rebalancing of data after new vdevs are added. >>> So adding new vdevs won't change the read performance of data already on >>> disk. >>> >>> ZFS does try to strip across vdevs, but if your old vdevs are nearly full >>> then adding new ones results in data mostly going to the new, nearly empty >>> vdevs. So if you only added a single new vdev to expand the pool then >>> you'll see write performance roughly equal to the performance of that >>> single vdev. >>> >>> Rebalancing can be done roughly with "zfs send | zfs receive". If you do >>> this enough times, and destroy old, sent datasets after an iteration, then >>> you can to some extent rebalance a pool. You won't achieve a perfect >>> rebalance, though. >>> >>> We can thank Oracle for the destruction of the archives at sun.com which >>> made it pretty darn difficult to find those posts. >>> >>> Finally, single GigE is _slow_. I see no point in a "landing pad" when >>> using unbonded GigE. >>> >> Actually it has had some significant changes which are likely to effect >> the results as it now has >> an entirely new IO scheduler, so retesting would be wise. > And this affects which parts of my post? > > Reading and writing to a raidz* requires touching all or almost all of > the disks. > > Writing to a mirror requires touching all the disks. Reading from a mirror > requires touching one disk. Yes however if you get say a 10% improvement on scheduling said writes / reads then the overall impact will be noticeable. > That hasn't changed. I'm skeptical that a new way of doing the same thing > would change the results that much, especially for a large stream of > data. > > I can see a new I/O scheduler being more _fair_, but that only applies > when the box has multiple things going on. A concrete example for mirrors will performance when dealing with 3 readers demonstrated an increased in throughput from 168MB/s to 320MB/s with prefetch and without prefetch that was 95MB/s increased to 284MB/s in our testing, so significant differences. This is a rather extreme example, but there's never any harm in re-testing to avoid using incorrect assumptions ;-) Regards Steve