Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Jun 2015 17:04:42 +0100
From:      Steven Hartland <killing@multiplay.co.uk>
To:        kpneal@pobox.com
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS raid write performance?
Message-ID:  <5588321A.4060102@multiplay.co.uk>
In-Reply-To: <20150622153056.GA96798@neutralgood.org>
References:  <5587C3FF.9070407@sneakertech.com> <20150622121343.GB60684@neutralgood.org> <55880544.70907@multiplay.co.uk> <20150622153056.GA96798@neutralgood.org>

next in thread | previous in thread | raw e-mail | index | archive | help


On 22/06/2015 16:30, kpneal@pobox.com wrote:
> On Mon, Jun 22, 2015 at 01:53:24PM +0100, Steven Hartland wrote:
>> On 22/06/2015 13:13, kpneal@pobox.com wrote:
>>> On Mon, Jun 22, 2015 at 04:14:55AM -0400, Quartz wrote:
>>>> What's sequential write performance like these days for ZFS raidzX?
>>>> Someone suggested to me that I set up a single not-raid disk to act as a
>>>> fast 'landing pad' for receiving files, then move them to the pool later
>>>> in the background. Is that actually necessary? (Assume generic sata
>>>> drives, 250mb-4gb sized files, and transfers are across a LAN using
>>>> single unbonded GigE).
>>> Tests were posted to ZFS lists a few years ago. That was a while ago, but
>>> at a fundamental level ZFS hasn't changed since then so the results should
>>> still be valid.
>>>
>>> For both reads and writes all levels of raidz* perform slightly faster
>>> than the speed of a single drive. _Slightly_ faster, like, the speed of
>>> a single drive * 1.1 or so roughly speaking.
>>>
>>> For mirrors, writes perform about the same as a single drive, and as more
>>> drives are added they get slightly worse. But reads scale pretty well as
>>> you add drives because reads can be spread across all the drives in the
>>> mirror in parallel.
>>>
>>> Having multiple vdevs helps because ZFS does striping across the vdevs.
>>> However, this striping only happens with writes that are done _after_ new
>>> vdevs are added. There is no rebalancing of data after new vdevs are added.
>>> So adding new vdevs won't change the read performance of data already on
>>> disk.
>>>
>>> ZFS does try to strip across vdevs, but if your old vdevs are nearly full
>>> then adding new ones results in data mostly going to the new, nearly empty
>>> vdevs. So if you only added a single new vdev to expand the pool then
>>> you'll see write performance roughly equal to the performance of that
>>> single vdev.
>>>
>>> Rebalancing can be done roughly with "zfs send | zfs receive". If you do
>>> this enough times, and destroy old, sent datasets after an iteration, then
>>> you can to some extent rebalance a pool. You won't achieve a perfect
>>> rebalance, though.
>>>
>>> We can thank Oracle for the destruction of the archives at sun.com which
>>> made it pretty darn difficult to find those posts.
>>>
>>> Finally, single GigE is _slow_. I see no point in a "landing pad" when
>>> using unbonded GigE.
>>>
>> Actually it has had some significant changes which are likely to effect
>> the results as it now has
>> an entirely new IO scheduler, so retesting would be wise.
> And this affects which parts of my post?
>
> Reading and writing to a raidz* requires touching all or almost all of
> the disks.
>
> Writing to a mirror requires touching all the disks. Reading from a mirror
> requires touching one disk.
Yes however if you get say a 10% improvement on scheduling said writes / 
reads then the overall impact will be noticeable.
> That hasn't changed. I'm skeptical that a new way of doing the same thing
> would change the results that much, especially for a large stream of
> data.
>
> I can see a new I/O scheduler being more _fair_, but that only applies
> when the box has multiple things going on.
A concrete example for mirrors will performance when dealing with 3 
readers demonstrated an increased in throughput from 168MB/s to 320MB/s 
with prefetch and without prefetch that was 95MB/s increased to 284MB/s 
in our testing, so significant differences.

This is a rather extreme example, but there's never any harm in 
re-testing to avoid using incorrect assumptions ;-)

     Regards
     Steve



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5588321A.4060102>