From owner-freebsd-fs@freebsd.org Mon Mar 6 18:11:20 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B4187CFB57A for ; Mon, 6 Mar 2017 18:11:20 +0000 (UTC) (envelope-from matthew.ahrens@delphix.com) Received: from mail-it0-x233.google.com (mail-it0-x233.google.com [IPv6:2607:f8b0:4001:c0b::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8322D199C for ; Mon, 6 Mar 2017 18:11:20 +0000 (UTC) (envelope-from matthew.ahrens@delphix.com) Received: by mail-it0-x233.google.com with SMTP id 203so55662323ith.0 for ; Mon, 06 Mar 2017 10:11:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphix.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=JX/QiiFwubRnM2er4fpZB8gcgvijOOONyCzCsnBq8Pc=; b=aP1mGH94v+sB7BfQY3X3aMM/9MWLEu2SntlsftyMYx4RVnTmY1PEWgI9xLXIbyFMpp LIY/iJwo1TW+WMPBXAjkobmKsj1SkaJHos3VOeqNqkIxCjiXSJmLu3ereNDHI7pTR7Ny VY+rJAyehg5o6ewh6aoyDCBtwnp+s2wBviRic= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=JX/QiiFwubRnM2er4fpZB8gcgvijOOONyCzCsnBq8Pc=; b=RhY/ow442nxcYo+I1ZYgaIiYJs5IvX3itp2EopUByPkfNWAnrSuOhVkDjxXKOKpKJI wusx3CtLNp6Do9b4fmg+9+bQGZzJ4+ubjAhPfKoDgBFbSDgPTQc0PCnbmfaEQqSjSTQH mweTpvNXuXGpyDYoFF7iXPo+OldvkJsIooz7I2gCcHH8Z1r7e00p6gPFRL6+6CJa4oix NdY7ao0tsmsFQPqBwvlX2XP4ZfneqlLt3Mn5/Eojv1t3K93xyky7zG2UqwCzVulQ0nHg j4Atxm318BEER+M44G+qLAmUmh8G5IpdC8PathX2ZE5H7cIjvJ14XM8Bzs95Nsu3xI8Z jTHg== X-Gm-Message-State: AMke39k4VrVhCBuw2MFP8lKT5co+QJb4TcR9YSWFpw1YskFo2wsBmmZANpNvCvSnxv/jcpK7unX73WMxOpCJPBZT X-Received: by 10.36.59.7 with SMTP id c7mr17108316ita.43.1488823879598; Mon, 06 Mar 2017 10:11:19 -0800 (PST) MIME-Version: 1.0 Received: by 10.36.27.65 with HTTP; Mon, 6 Mar 2017 10:11:19 -0800 (PST) In-Reply-To: <374b6d16-5cb4-9338-ec1d-65ad93ca29dc@FreeBSD.org> References: <8b4ba98d-03d3-f671-33b2-ed12d3b4fb7c@FreeBSD.org> <374b6d16-5cb4-9338-ec1d-65ad93ca29dc@FreeBSD.org> From: Matthew Ahrens Date: Mon, 6 Mar 2017 10:11:19 -0800 Message-ID: Subject: Re: 11-STABLE vs 11.0-RELENG test To: Alexander Motin Cc: freebsd-fs , Andriy Gapon Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Mar 2017 18:11:20 -0000 On Wed, Dec 7, 2016 at 5:34 AM, Alexander Motin wrote: > On 06.12.2016 22:26, Alexander Motin wrote: > > I've reproduced this issue with quick test on my lab system configured > > with 12-disk RAIDZ2 pool. I've measured write and read back (with and > > without prefetch) speeds for pool recreated on different FreeBSD head > > revisions: > > r309625 r305456 r305330 r305322 > > write 702 701 1115 1120 > > read w/ pref 232 228 518 512 > > read w/o pref 128 126 242 240 > > > > I suspect we could obtain the problem here: > > > > r305331 | mav | 2016-09-03 13:04:37 +0300 (=D1=81=D0=B1, 03 =D1=81=D0= =B5=D0=BD=D1=82. 2016) | 45 lines > > > > MFV r304155: 7090 zfs should improve allocation order and throttle > > allocations > > Closer look shown me the cause. This code sorts I/Os on time, offset > and memory address. But time on FreeBSD (to reduce overhead) returned > with 1ms resolution, so it does not provide reliable ordering. Offset > sorting used by this patch is broken by design, since io_offset field is > always zero there, since it is used only for physical I/Os, not for > logical. As result, I/Os are "sorted" on memory address, that in fact > means complete randomization of all allocations within one millisecond, > predictably killing read performance. > > Switching gethrtime() emulation from getnanouptime() to nanouptime() > fixes the read performance, resulting: > nanouptime() > write 702 > read w/ pref 845 > read w/o pref 272 > > It would be good to make offset sorting really work there rather then > just switching to high resolution time source, but that maybe quite > invasive. Will look more. FYI - I am changing this sorting to be based on the bookmark (objset, object, level, blkid) instead of timestamp. This normally doesn't change the performance on illumos, but it should fix this on FreeBSD (and it ensures that the multi-threaded spa_sync doesn't degrade performance). Here's the PR; you can pick out the "sort by bookmark" commit if you want: https://github.com/openzfs/openzfs/pull/138 --matt