Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 7 Dec 2016 11:49:36 -0500
From:      George Wilson <george.wilson@delphix.com>
To:        Alexander Motin <mav@freebsd.org>
Cc:        freebsd-fs@freebsd.org, Alex Tutubalin <lexa@lexa.ru>, Andriy Gapon <avg@freebsd.org>
Subject:   Re: 11-STABLE vs 11.0-RELENG test
Message-ID:  <CA%2Bj07uZwJN1KSRyRqGyDKgUDE=HOcWcsgjV9iRbr7oSTT4hNcA@mail.gmail.com>
In-Reply-To: <374b6d16-5cb4-9338-ec1d-65ad93ca29dc@FreeBSD.org>
References:  <8b4ba98d-03d3-f671-33b2-ed12d3b4fb7c@FreeBSD.org> <374b6d16-5cb4-9338-ec1d-65ad93ca29dc@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Alexander,

Nice find! It's true that io_offset is 0 when we sort but the offset
ordering should be implied by the way the way that we issue the I/Os from
dbuf_sync_list(). The I/Os are issued in order within the file which is
what we want. So having the offset comparison is probably not needed but we
could use the bookmark's blkid as a new comparator. Let me know if I can
help.

Thanks,
George

On Wed, Dec 7, 2016 at 8:34 AM, Alexander Motin <mav@freebsd.org> wrote:

> On 06.12.2016 22:26, Alexander Motin wrote:
> > I've reproduced this issue with quick test on my lab system configured
> > with 12-disk RAIDZ2 pool.  I've measured write and read back (with and
> > without prefetch) speeds for pool recreated on different FreeBSD head
> > revisions:
> >               r309625 r305456 r305330 r305322
> > write         702     701     1115    1120
> > read w/ pref  232     228     518     512
> > read w/o pref 128     126     242     240
> >
> > I suspect we could obtain the problem here:
> >
> > r305331 | mav | 2016-09-03 13:04:37 +0300 (=D1=81=D0=B1, 03 =D1=81=D0=
=B5=D0=BD=D1=82. 2016) | 45 lines
> >
> > MFV r304155: 7090 zfs should improve allocation order and throttle
> > allocations
>
> Closer look shown me the cause.  This code sorts I/Os on time, offset
> and memory address.  But time on FreeBSD (to reduce overhead) returned
> with 1ms resolution, so it does not provide reliable ordering.  Offset
> sorting used by this patch is broken by design, since io_offset field is
> always zero there, since it is used only for physical I/Os, not for
> logical.  As result, I/Os are "sorted" on memory address, that in fact
> means complete randomization of all allocations within one millisecond,
> predictably killing read performance.
>
> Switching gethrtime() emulation from getnanouptime() to nanouptime()
> fixes the read performance, resulting:
>                 nanouptime()
> write           702
> read w/ pref    845
> read w/o pref   272
>
> It would be good to make offset sorting really work there rather then
> just switching to high resolution time source, but that maybe quite
> invasive.  Will look more.
>
> --
> Alexander Motin
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2Bj07uZwJN1KSRyRqGyDKgUDE=HOcWcsgjV9iRbr7oSTT4hNcA>