Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 Mar 2017 10:11:19 -0800
From:      Matthew Ahrens <mahrens@delphix.com>
To:        Alexander Motin <mav@freebsd.org>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>, Andriy Gapon <avg@freebsd.org>
Subject:   Re: 11-STABLE vs 11.0-RELENG test
Message-ID:  <CAJjvXiE3aaG-Eb1Q_3RQ3zony0jUqoVDzEkrD6AC%2BJ_r9VVc0g@mail.gmail.com>
In-Reply-To: <374b6d16-5cb4-9338-ec1d-65ad93ca29dc@FreeBSD.org>
References:  <8b4ba98d-03d3-f671-33b2-ed12d3b4fb7c@FreeBSD.org> <374b6d16-5cb4-9338-ec1d-65ad93ca29dc@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Dec 7, 2016 at 5:34 AM, Alexander Motin <mav@freebsd.org> wrote:

> On 06.12.2016 22:26, Alexander Motin wrote:
> > I've reproduced this issue with quick test on my lab system configured
> > with 12-disk RAIDZ2 pool.  I've measured write and read back (with and
> > without prefetch) speeds for pool recreated on different FreeBSD head
> > revisions:
> >               r309625 r305456 r305330 r305322
> > write         702     701     1115    1120
> > read w/ pref  232     228     518     512
> > read w/o pref 128     126     242     240
> >
> > I suspect we could obtain the problem here:
> >
> > r305331 | mav | 2016-09-03 13:04:37 +0300 (=D1=81=D0=B1, 03 =D1=81=D0=
=B5=D0=BD=D1=82. 2016) | 45 lines
> >
> > MFV r304155: 7090 zfs should improve allocation order and throttle
> > allocations
>
> Closer look shown me the cause.  This code sorts I/Os on time, offset
> and memory address.  But time on FreeBSD (to reduce overhead) returned
> with 1ms resolution, so it does not provide reliable ordering.  Offset
> sorting used by this patch is broken by design, since io_offset field is
> always zero there, since it is used only for physical I/Os, not for
> logical.  As result, I/Os are "sorted" on memory address, that in fact
> means complete randomization of all allocations within one millisecond,
> predictably killing read performance.
>
> Switching gethrtime() emulation from getnanouptime() to nanouptime()
> fixes the read performance, resulting:
>                 nanouptime()
> write           702
> read w/ pref    845
> read w/o pref   272
>
> It would be good to make offset sorting really work there rather then
> just switching to high resolution time source, but that maybe quite
> invasive.  Will look more.


FYI - I am changing this sorting to be based on the bookmark (objset,
object, level, blkid) instead of timestamp.  This normally doesn't change
the performance on illumos, but it should fix this on FreeBSD (and it
ensures that the multi-threaded spa_sync doesn't degrade performance).

Here's the PR; you can pick out the "sort by bookmark" commit if you want:

https://github.com/openzfs/openzfs/pull/138

--matt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJjvXiE3aaG-Eb1Q_3RQ3zony0jUqoVDzEkrD6AC%2BJ_r9VVc0g>