From owner-freebsd-fs@freebsd.org Wed Dec 7 13:34:06 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EAA5CC6BA93 for ; Wed, 7 Dec 2016 13:34:06 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com [IPv6:2a00:1450:400c:c09::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7FC2311BA; Wed, 7 Dec 2016 13:34:06 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: by mail-wm0-x244.google.com with SMTP id u144so27828941wmu.0; Wed, 07 Dec 2016 05:34:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=WpWTUDBFz/th7g1U2rJcluVyKyUTqm8NxVnSNjWxA+8=; b=MEV35tajgyF8FTX8UiWQSH9dURiUqpIPjbjN6gWZYdTSyMbeVpD584H2x7yHY8rtvh uzQJny503WFWYmURi4C0bFRvgA9ztc8kK5i9shZPtSemwZu7LnaW9v71XpJeSO1iEry0 X9PbGgUpGHJ07qpK7bS6LyafYUgKGPGdoyoKGI8mTLScMMFcWkar8hkbEMmK2XvHVlbT x03KKT0pbuM4U+CX966Rc2mhVhQK9hfo08xA2F5diz5MYpYzgL3cDexVKauKLt/iq0Wq mQrySRMuSuLKhQvSvEBppP0W0BhplKp70MI8zMdH0FzLeIhWczX30GjSwX09aj+aPNnn LgMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:to:references:cc:from:message-id :date:user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=WpWTUDBFz/th7g1U2rJcluVyKyUTqm8NxVnSNjWxA+8=; b=MYe59L0wkcmdC6bxv5hr5RWARV2pLEiuPzPMgArXIY5wy8leNjXqRUTi8Vxo95vWrF IyENUEW0VG2mR4QOeUCYfQ2EyEdAh++mcVoYbb6gdK1cu4CDo8yN7UtuZGMcwpqAfQre Li2azQBNO3NsCNl3m2fwwoYUR3ijFB1MyKTgWR1tkeGLYco+NoCK2+mRy5smpO2rmOAg YQMGoGIVqFtOXo1pGIQultPw20jlrNJRBtnsvncytomCsdM4o9emR4Z9i4xIVZ6jyiCe SAIx7N7EEkmnuG1FAZiBG9WaQA6qMhYkA4v4Ic12FChnOYc2WmtFfHOPtH6MZvPdmf+l IySw== X-Gm-Message-State: AKaTC00+Iv7ZYfBX7+IgcuEV3sxyShUK6FhSmf5n5wy2fduDCSsOp6OwuugxOV6w830b7A== X-Received: by 10.25.72.74 with SMTP id v71mr25245618lfa.130.1481117644505; Wed, 07 Dec 2016 05:34:04 -0800 (PST) Received: from spectre.mavhome.dp.ua ([134.249.139.101]) by smtp.gmail.com with ESMTPSA id z26sm4741330lja.49.2016.12.07.05.34.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 07 Dec 2016 05:34:03 -0800 (PST) Sender: Alexander Motin Subject: Re: 11-STABLE vs 11.0-RELENG test To: freebsd-fs@FreeBSD.org References: <8b4ba98d-03d3-f671-33b2-ed12d3b4fb7c@FreeBSD.org> Cc: Alex Tutubalin , Andriy Gapon From: Alexander Motin Message-ID: <374b6d16-5cb4-9338-ec1d-65ad93ca29dc@FreeBSD.org> Date: Wed, 7 Dec 2016 15:34:02 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <8b4ba98d-03d3-f671-33b2-ed12d3b4fb7c@FreeBSD.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Dec 2016 13:34:07 -0000 On 06.12.2016 22:26, Alexander Motin wrote: > I've reproduced this issue with quick test on my lab system configured > with 12-disk RAIDZ2 pool. I've measured write and read back (with and > without prefetch) speeds for pool recreated on different FreeBSD head > revisions: > r309625 r305456 r305330 r305322 > write 702 701 1115 1120 > read w/ pref 232 228 518 512 > read w/o pref 128 126 242 240 > > I suspect we could obtain the problem here: > > r305331 | mav | 2016-09-03 13:04:37 +0300 (сб, 03 сент. 2016) | 45 lines > > MFV r304155: 7090 zfs should improve allocation order and throttle > allocations Closer look shown me the cause. This code sorts I/Os on time, offset and memory address. But time on FreeBSD (to reduce overhead) returned with 1ms resolution, so it does not provide reliable ordering. Offset sorting used by this patch is broken by design, since io_offset field is always zero there, since it is used only for physical I/Os, not for logical. As result, I/Os are "sorted" on memory address, that in fact means complete randomization of all allocations within one millisecond, predictably killing read performance. Switching gethrtime() emulation from getnanouptime() to nanouptime() fixes the read performance, resulting: nanouptime() write 702 read w/ pref 845 read w/o pref 272 It would be good to make offset sorting really work there rather then just switching to high resolution time source, but that maybe quite invasive. Will look more. -- Alexander Motin