From owner-freebsd-geom@freebsd.org Mon Jul 6 18:21:46 2020 Return-Path: Delivered-To: freebsd-geom@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 78B7836DA31 for ; Mon, 6 Jul 2020 18:21:46 +0000 (UTC) (envelope-from crest@rlwinm.de) Received: from mail.rlwinm.de (mail.rlwinm.de [138.201.35.217]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4B0v6K3skXz4NNj for ; Mon, 6 Jul 2020 18:21:45 +0000 (UTC) (envelope-from crest@rlwinm.de) Received: from crest-mbp.fritz.box (200116b864060700ed126ddbf990e0d6.dip.versatel-1u1.de [IPv6:2001:16b8:6406:700:ed12:6ddb:f990:e0d6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.rlwinm.de (Postfix) with ESMTPSA id 9BFE41FD1C for ; Mon, 6 Jul 2020 18:21:37 +0000 (UTC) Subject: Re: Single-threaded bottleneck in geli To: freebsd-geom@freebsd.org References: From: Jan Bramkamp Message-ID: <550d61f9-506e-710c-8800-4f13143cf976@rlwinm.de> Date: Mon, 6 Jul 2020 20:21:36 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Rspamd-Queue-Id: 4B0v6K3skXz4NNj X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of crest@rlwinm.de designates 138.201.35.217 as permitted sender) smtp.mailfrom=crest@rlwinm.de X-Spamd-Result: default: False [-2.56 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-geom@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-0.98)[-0.975]; DMARC_NA(0.00)[rlwinm.de]; NEURAL_HAM_SHORT(-0.33)[-0.330]; NEURAL_HAM_MEDIUM(-0.96)[-0.957]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:138.201.0.0/16, country:DE]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[2001:16b8:6406:700:ed12:6ddb:f990:e0d6:received] X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2020 18:21:46 -0000 On 03.07.20 21:30, Alan Somers wrote: > I'm using geli, gmultipath, and ZFS on a large system, with hundreds of > drives. What I'm seeing is that under at least some workloads, the overall > performance is limited by the single geom kernel process. procstat and > kgdb aren't much help in telling exactly why this process is using so much > CPU, but it certainly must be related to the fact that over 15,000 IOPs are > going through that thread. What can I do to improve this situation? Would > it make sense to enable direct dispatch for geli? That would hurt > single-threaded performance, but probably improve performance for highly > multithreaded workloads like mine. > > Example top output: > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 13 root -8 - 0B 96K CPU46 46 82.7H 70.54% > geom{g_down} > 13 root -8 - 0B 96K - 9 35.5H 25.32% > geom{g_up} > > -Alan The problem isn't GELI. It's the problem is that gmultipath lacks direct dispatch support. Last one and a half years ago I ran into the same problem. Because I needed the performance I looked at what gmultipath did and found now reason why it has run in the GEOM up and down threads. So i patched in the flags claiming direct dispatch support. It improved my read performance from 2.2GB/s to 3.4GB/s and write performance from 750MB/s to 1.5GB/s the system worked for a few days under high load (saturated a 2 x 10Gb/s lagg(4) as read only WebDAV server and while receiving uploads via SFTP). It worked until I attempted to shutdown the system. It hung on shutdown an never powered off. I had to power cycle the box via IPMI to recover. I never found the time to debug this problem.