From owner-freebsd-fs@FreeBSD.ORG  Mon Jun 22 16:04:53 2015
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@nevdull.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9BA7B2E9
 for <freebsd-fs@nevdull.freebsd.org>; Mon, 22 Jun 2015 16:04:53 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com [74.125.82.49])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 38A50D90
 for <freebsd-fs@freebsd.org>; Mon, 22 Jun 2015 16:04:52 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: by wguu7 with SMTP id u7so73463155wgu.3
 for <freebsd-fs@freebsd.org>; Mon, 22 Jun 2015 09:04:45 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:cc:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-type
 :content-transfer-encoding;
 bh=nKSgCEWwSNgLooc2Y/DvM1oukiioE6h8hy1ROiLHXJk=;
 b=hr31UO5j5gWk6gqVDjB631G/j+Whhlz1jO5sXUCJoVWklPhCwaxFQ4tQDqG4Xv1amV
 QHr8iN3pMlskaiLdq+iqpEEh7kDVt++s2ESa/baw/Y8Jebo96TR4EROpkp6ZRwRN/erN
 m300+vF8DBppFBOOyuv4hsZ7rkl3YLtzVDbhimKX78eK7qKHpmM5v2ydbZeXx9FmRpd+
 DKDW0A1PNHwt0lB00PLDrT4d4hcNmwLuPXZ3FScq4mATD+IlVkIYA23l1vtCB/2Q31Te
 IT4TJMSuMbxAmCDD2et3VRy4XfZVszSGpiKcPYXt4bjYK0db2FfAoqLKX8BWjJUJZmGy
 /alg==
X-Gm-Message-State: ALoCoQmsC8i+sFc0uSIzO4MuPxOXg0Tvl8c/I2qrK1yFRKtuAVEbB+DfAHeuj/sPIuqNJujgz27T
X-Received: by 10.194.109.36 with SMTP id hp4mr51614154wjb.4.1434989085051;
 Mon, 22 Jun 2015 09:04:45 -0700 (PDT)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk.
 [82.69.141.170])
 by mx.google.com with ESMTPSA id fo13sm17870049wic.0.2015.06.22.09.04.43
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 22 Jun 2015 09:04:43 -0700 (PDT)
Subject: Re: ZFS raid write performance?
To: kpneal@pobox.com
References: <5587C3FF.9070407@sneakertech.com>
 <20150622121343.GB60684@neutralgood.org> <55880544.70907@multiplay.co.uk>
 <20150622153056.GA96798@neutralgood.org>
Cc: freebsd-fs@freebsd.org
From: Steven Hartland <killing@multiplay.co.uk>
Message-ID: <5588321A.4060102@multiplay.co.uk>
Date: Mon, 22 Jun 2015 17:04:42 +0100
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101
 Thunderbird/38.0.1
MIME-Version: 1.0
In-Reply-To: <20150622153056.GA96798@neutralgood.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Jun 2015 16:04:53 -0000


On 22/06/2015 16:30, kpneal@pobox.com wrote:
> On Mon, Jun 22, 2015 at 01:53:24PM +0100, Steven Hartland wrote:
>> On 22/06/2015 13:13, kpneal@pobox.com wrote:
>>> On Mon, Jun 22, 2015 at 04:14:55AM -0400, Quartz wrote:
>>>> What's sequential write performance like these days for ZFS raidzX?
>>>> Someone suggested to me that I set up a single not-raid disk to act as a
>>>> fast 'landing pad' for receiving files, then move them to the pool later
>>>> in the background. Is that actually necessary? (Assume generic sata
>>>> drives, 250mb-4gb sized files, and transfers are across a LAN using
>>>> single unbonded GigE).
>>> Tests were posted to ZFS lists a few years ago. That was a while ago, but
>>> at a fundamental level ZFS hasn't changed since then so the results should
>>> still be valid.
>>>
>>> For both reads and writes all levels of raidz* perform slightly faster
>>> than the speed of a single drive. _Slightly_ faster, like, the speed of
>>> a single drive * 1.1 or so roughly speaking.
>>>
>>> For mirrors, writes perform about the same as a single drive, and as more
>>> drives are added they get slightly worse. But reads scale pretty well as
>>> you add drives because reads can be spread across all the drives in the
>>> mirror in parallel.
>>>
>>> Having multiple vdevs helps because ZFS does striping across the vdevs.
>>> However, this striping only happens with writes that are done _after_ new
>>> vdevs are added. There is no rebalancing of data after new vdevs are added.
>>> So adding new vdevs won't change the read performance of data already on
>>> disk.
>>>
>>> ZFS does try to strip across vdevs, but if your old vdevs are nearly full
>>> then adding new ones results in data mostly going to the new, nearly empty
>>> vdevs. So if you only added a single new vdev to expand the pool then
>>> you'll see write performance roughly equal to the performance of that
>>> single vdev.
>>>
>>> Rebalancing can be done roughly with "zfs send | zfs receive". If you do
>>> this enough times, and destroy old, sent datasets after an iteration, then
>>> you can to some extent rebalance a pool. You won't achieve a perfect
>>> rebalance, though.
>>>
>>> We can thank Oracle for the destruction of the archives at sun.com which
>>> made it pretty darn difficult to find those posts.
>>>
>>> Finally, single GigE is _slow_. I see no point in a "landing pad" when
>>> using unbonded GigE.
>>>
>> Actually it has had some significant changes which are likely to effect
>> the results as it now has
>> an entirely new IO scheduler, so retesting would be wise.
> And this affects which parts of my post?
>
> Reading and writing to a raidz* requires touching all or almost all of
> the disks.
>
> Writing to a mirror requires touching all the disks. Reading from a mirror
> requires touching one disk.
Yes however if you get say a 10% improvement on scheduling said writes / 
reads then the overall impact will be noticeable.
> That hasn't changed. I'm skeptical that a new way of doing the same thing
> would change the results that much, especially for a large stream of
> data.
>
> I can see a new I/O scheduler being more _fair_, but that only applies
> when the box has multiple things going on.
A concrete example for mirrors will performance when dealing with 3 
readers demonstrated an increased in throughput from 168MB/s to 320MB/s 
with prefetch and without prefetch that was 95MB/s increased to 284MB/s 
in our testing, so significant differences.

This is a rather extreme example, but there's never any harm in 
re-testing to avoid using incorrect assumptions ;-)

     Regards
     Steve