From owner-freebsd-hackers@freebsd.org Fri Jul 10 03:34:34 2020 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id BE29835D8AF for ; Fri, 10 Jul 2020 03:34:34 +0000 (UTC) (envelope-from jpaetzel@FreeBSD.org) Received: from forward4-smtp.messagingengine.com (forward4-smtp.messagingengine.com [66.111.4.238]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4B2zDp3806z42Gd; Fri, 10 Jul 2020 03:34:34 +0000 (UTC) (envelope-from jpaetzel@FreeBSD.org) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailforward.nyi.internal (Postfix) with ESMTP id 7E8C3194285D; Thu, 9 Jul 2020 23:34:33 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 09 Jul 2020 23:34:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=HDBlpxvSDXhnoqEhgS30QQQ3XpagxHHYObClVmlQq dM=; b=Z0PfKfYLZa/wIO2ARRt+4MJVCBXlOSI0/5RpF2T5p5vTBM+l1rLc5RcSt f1aKhliYFXonoA+vdzhKoVOUewaq4MQclGOVolrLDiPm2brGYTlWGJ/2M84FQtaS Ljvy4UmzoZjxIUXFREUSVTX9wnVs0WPVsA6CPPNi05ghjSaP1bt3Gddp+VGg/QRv uKtrh425pnRMW+53DbwbJ3xe5jhr31O+c1nYwtC8iv+Q22Dz6GOG8aIVCaV12QK1 bU59yST13DTg7xgO/fv9/hBSeucHNIZks5kpe/WTUAgJOsQm9qwRw2dsgbQkDB0h NG0hghxUdik1PwCpNdsgHMaOQMxTQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrvddtgdejvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpegtgffhggfufffkfhgjvffosehtqhhmtdhhtdejnecuhfhrohhmpeflohhshhcu rfgrvghtiigvlhcuoehjphgrvghtiigvlheshfhrvggvuefuffdrohhrgheqnecuggftrf grthhtvghrnhepvdejveffkefguedtffehueeihfelgeeuvddvffehjeevkeejffeiiedt heduieegnecukfhppedukeegrdeljedruddvtddruddvkeenucevlhhushhtvghrufhiii gvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehjphgrvghtiigvlheshfhrvggvuefu ffdrohhrgh X-ME-Proxy: Received: from [192.168.133.203] (unknown [184.97.120.128]) by mail.messagingengine.com (Postfix) with ESMTPA id 014D030600B7; Thu, 9 Jul 2020 23:34:33 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Josh Paetzel Mime-Version: 1.0 (1.0) Subject: Re: Right-sizing the geli thread pool Date: Thu, 9 Jul 2020 22:34:30 -0500 Message-Id: <49D059B5-9A35-4EB5-9811-AFB024DA0566@FreeBSD.org> References: Cc: FreeBSD Hackers In-Reply-To: To: Alan Somers X-Mailer: iPhone Mail (17F80) X-Rspamd-Queue-Id: 4B2zDp3806z42Gd X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:11403, ipnet:66.111.4.0/24, country:US] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jul 2020 03:34:34 -0000 >=20 > On Jul 9, 2020, at 4:27 PM, Alan Somers wrote: >=20 > =EF=BB=BFCurrently, geli creates a separate thread pool for each provider,= and by > default each thread pool contains one thread per cpu. On a large server > with many encrypted disks, that can balloon into a very large number of > threads! I have a patch in progress that switches from per-provider threa= d > pools to a single thread pool for the entire module. Happily, I see read > IOPs increase by up to 60%. But to my surprise, write IOPs _decreases_ by= > up to 25%. dtrace suggests that the CPU usage is dominated by the > vmem_free call in biodone, as in the below stack. >=20 > kernel`lock_delay+0x32 > kernel`biodone+0x88 > kernel`g_io_deliver+0x214 > geom_eli.ko`g_eli_write_done+0xf6 > kernel`g_io_deliver+0x214 > kernel`md_kthread+0x275 > kernel`fork_exit+0x7e > kernel`0xffffffff8104784e >=20 > I only have one idea for how to improve things from here. The geli thread= > pool is still fed by a single global bio queue. That could cause cache > thrashing, if bios get moved between cores too often. I think a superior > design would be to use a separate bio queue for each geli thread, and use > work-stealing to balance them. However, >=20 > 1) That doesn't explain why this change benefits reads more than writes, a= nd > 2) work-stealing is hard to get right, and I can't find any examples in th= e > kernel. >=20 > Can anybody offer tips or code for implementing work stealing? Or any > other suggestions about why my write performance is suffering? I would > like to get this change committed, but not without resolving that issue. >=20 > -Alan > __ Alan, Several years ago I spent a bunch of time optimizing geli+ZFS performance. Nothing as ambitious as what you are doing though. I have some hand wavy theories about the write performance and how cache thr= ash would be more expensive for writes than reads. The default configuratio= n is essentially pathological for systems with large amounts of disks. But t= hat doesn=E2=80=99t really explain why your change drops performance. Howev= er I=E2=80=99ll send you over some dtrace stuff I have at work tomorrow. It=E2= =80=99s pretty sophisticated and should let you visualize the entire I/O pip= eline. (You=E2=80=99ll have to add the geli part) What I discovered is without a histogram based auto tuner it was not possibl= e to tune for optimal performance for dynamic workloads. As to your question about work stealing. I=E2=80=99ve got nothing there. Thanks, Josh Paetzel FreeBSD - The Power to Serve