Date: Wed, 7 Dec 2016 11:49:36 -0500 From: George Wilson <george.wilson@delphix.com> To: Alexander Motin <mav@freebsd.org> Cc: freebsd-fs@freebsd.org, Alex Tutubalin <lexa@lexa.ru>, Andriy Gapon <avg@freebsd.org> Subject: Re: 11-STABLE vs 11.0-RELENG test Message-ID: <CA%2Bj07uZwJN1KSRyRqGyDKgUDE=HOcWcsgjV9iRbr7oSTT4hNcA@mail.gmail.com> In-Reply-To: <374b6d16-5cb4-9338-ec1d-65ad93ca29dc@FreeBSD.org> References: <8b4ba98d-03d3-f671-33b2-ed12d3b4fb7c@FreeBSD.org> <374b6d16-5cb4-9338-ec1d-65ad93ca29dc@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Alexander, Nice find! It's true that io_offset is 0 when we sort but the offset ordering should be implied by the way the way that we issue the I/Os from dbuf_sync_list(). The I/Os are issued in order within the file which is what we want. So having the offset comparison is probably not needed but we could use the bookmark's blkid as a new comparator. Let me know if I can help. Thanks, George On Wed, Dec 7, 2016 at 8:34 AM, Alexander Motin <mav@freebsd.org> wrote: > On 06.12.2016 22:26, Alexander Motin wrote: > > I've reproduced this issue with quick test on my lab system configured > > with 12-disk RAIDZ2 pool. I've measured write and read back (with and > > without prefetch) speeds for pool recreated on different FreeBSD head > > revisions: > > r309625 r305456 r305330 r305322 > > write 702 701 1115 1120 > > read w/ pref 232 228 518 512 > > read w/o pref 128 126 242 240 > > > > I suspect we could obtain the problem here: > > > > r305331 | mav | 2016-09-03 13:04:37 +0300 (=D1=81=D0=B1, 03 =D1=81=D0= =B5=D0=BD=D1=82. 2016) | 45 lines > > > > MFV r304155: 7090 zfs should improve allocation order and throttle > > allocations > > Closer look shown me the cause. This code sorts I/Os on time, offset > and memory address. But time on FreeBSD (to reduce overhead) returned > with 1ms resolution, so it does not provide reliable ordering. Offset > sorting used by this patch is broken by design, since io_offset field is > always zero there, since it is used only for physical I/Os, not for > logical. As result, I/Os are "sorted" on memory address, that in fact > means complete randomization of all allocations within one millisecond, > predictably killing read performance. > > Switching gethrtime() emulation from getnanouptime() to nanouptime() > fixes the read performance, resulting: > nanouptime() > write 702 > read w/ pref 845 > read w/o pref 272 > > It would be good to make offset sorting really work there rather then > just switching to high resolution time source, but that maybe quite > invasive. Will look more. > > -- > Alexander Motin >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2Bj07uZwJN1KSRyRqGyDKgUDE=HOcWcsgjV9iRbr7oSTT4hNcA>