Date: Mon, 22 Jun 2015 17:04:42 +0100 From: Steven Hartland <killing@multiplay.co.uk> To: kpneal@pobox.com Cc: freebsd-fs@freebsd.org Subject: Re: ZFS raid write performance? Message-ID: <5588321A.4060102@multiplay.co.uk> In-Reply-To: <20150622153056.GA96798@neutralgood.org> References: <5587C3FF.9070407@sneakertech.com> <20150622121343.GB60684@neutralgood.org> <55880544.70907@multiplay.co.uk> <20150622153056.GA96798@neutralgood.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 22/06/2015 16:30, kpneal@pobox.com wrote: > On Mon, Jun 22, 2015 at 01:53:24PM +0100, Steven Hartland wrote: >> On 22/06/2015 13:13, kpneal@pobox.com wrote: >>> On Mon, Jun 22, 2015 at 04:14:55AM -0400, Quartz wrote: >>>> What's sequential write performance like these days for ZFS raidzX? >>>> Someone suggested to me that I set up a single not-raid disk to act as a >>>> fast 'landing pad' for receiving files, then move them to the pool later >>>> in the background. Is that actually necessary? (Assume generic sata >>>> drives, 250mb-4gb sized files, and transfers are across a LAN using >>>> single unbonded GigE). >>> Tests were posted to ZFS lists a few years ago. That was a while ago, but >>> at a fundamental level ZFS hasn't changed since then so the results should >>> still be valid. >>> >>> For both reads and writes all levels of raidz* perform slightly faster >>> than the speed of a single drive. _Slightly_ faster, like, the speed of >>> a single drive * 1.1 or so roughly speaking. >>> >>> For mirrors, writes perform about the same as a single drive, and as more >>> drives are added they get slightly worse. But reads scale pretty well as >>> you add drives because reads can be spread across all the drives in the >>> mirror in parallel. >>> >>> Having multiple vdevs helps because ZFS does striping across the vdevs. >>> However, this striping only happens with writes that are done _after_ new >>> vdevs are added. There is no rebalancing of data after new vdevs are added. >>> So adding new vdevs won't change the read performance of data already on >>> disk. >>> >>> ZFS does try to strip across vdevs, but if your old vdevs are nearly full >>> then adding new ones results in data mostly going to the new, nearly empty >>> vdevs. So if you only added a single new vdev to expand the pool then >>> you'll see write performance roughly equal to the performance of that >>> single vdev. >>> >>> Rebalancing can be done roughly with "zfs send | zfs receive". If you do >>> this enough times, and destroy old, sent datasets after an iteration, then >>> you can to some extent rebalance a pool. You won't achieve a perfect >>> rebalance, though. >>> >>> We can thank Oracle for the destruction of the archives at sun.com which >>> made it pretty darn difficult to find those posts. >>> >>> Finally, single GigE is _slow_. I see no point in a "landing pad" when >>> using unbonded GigE. >>> >> Actually it has had some significant changes which are likely to effect >> the results as it now has >> an entirely new IO scheduler, so retesting would be wise. > And this affects which parts of my post? > > Reading and writing to a raidz* requires touching all or almost all of > the disks. > > Writing to a mirror requires touching all the disks. Reading from a mirror > requires touching one disk. Yes however if you get say a 10% improvement on scheduling said writes / reads then the overall impact will be noticeable. > That hasn't changed. I'm skeptical that a new way of doing the same thing > would change the results that much, especially for a large stream of > data. > > I can see a new I/O scheduler being more _fair_, but that only applies > when the box has multiple things going on. A concrete example for mirrors will performance when dealing with 3 readers demonstrated an increased in throughput from 168MB/s to 320MB/s with prefetch and without prefetch that was 95MB/s increased to 284MB/s in our testing, so significant differences. This is a rather extreme example, but there's never any harm in re-testing to avoid using incorrect assumptions ;-) Regards Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5588321A.4060102>