From owner-freebsd-hackers@freebsd.org Fri Jul 10 17:54:59 2020 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6180E36F42B for ; Fri, 10 Jul 2020 17:54:59 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4B3LKZ3mVmz4vqP; Fri, 10 Jul 2020 17:54:58 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qv1-xf44.google.com with SMTP id h17so2986562qvr.0; Fri, 10 Jul 2020 10:54:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=blGvUMUzHaHMd0CPebItmapaxEGasp1syYXG4+rOs3o=; b=AEGXzCD24aQzlP1KIDvbKzi15v4n/XxVoBv5LiI3pDq1C/j7XvWMB81z1UVaGDa8ex Kml/Sg7xzcl3rWcQQE64hqn2+RjcHSasE/f6lsG6gVDXkwPpz5R8qWQ+tBJexRqyyO7Y 9c1o9gQj4/xM/YSpExAGvM+/IMAeOLR8XBywnUlis7psgPEDBwroh+1nIQzLgCU+5UfV YTXUJPSFTSMuU0W7NrL4fAQDP9VHt2i3R50TrwRgsDbBb4zmnfaDKdva8u5YtGqAKLEU KCIpdcWM2WOACoHyLWGPTv/mmGP5532XZrrZ08SOJfHQ1KTx/EgUuzAMy5Prfi36Pk64 Qmwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=blGvUMUzHaHMd0CPebItmapaxEGasp1syYXG4+rOs3o=; b=ZyM0qU7M5J/crCqVsNtvZrH8SjYEsCg2rv9hPcqAIsxl1fM4l3HRm7ME/v6ki4j786 sOtgT3+fWYW2JmBwAflxPIEGZaRPZ2mU7+60KdzxLzsGQftT8ozPlYkEH3AGwlbkVA5u OHw00fwkmbWdrR/1xauomrAS7qSuoAsGZ5ztSehgWFYzv+N+bleW/Cn+KoOUgFNF50da NRjnQD/Kzm+wsd5COL1i17FzWr72c0cBmbizFzDHzCdqbRGQkEKO5x8cwpn3Q2+wUkr2 Yg83ezRXEo3RNaxErUgtVbIDkcuXhG59A+V2RIvdfrS2zYJvLibR5oc4f81Sj3GGb6TX T5lw== X-Gm-Message-State: AOAM532wROrXT3Wy9hatxuyeyXMMjeIhbgocgilufVcZnAY3QtGd0H4T g5LZGUOvYKyIfs6kwuQ2Wfo= X-Google-Smtp-Source: ABdhPJwz6J2pjwcJjf/v3EbptRaP+N6OO0llcpCt9YZcevBqh4hCW/abp9KUsYMyxv9cmUJYdO3SeA== X-Received: by 2002:a0c:b8a9:: with SMTP id y41mr56398334qvf.49.1594403697623; Fri, 10 Jul 2020 10:54:57 -0700 (PDT) Received: from raichu (toroon0560w-lp130-14-174-91-9-204.dsl.bell.ca. [174.91.9.204]) by smtp.gmail.com with ESMTPSA id f4sm8240087qtv.59.2020.07.10.10.54.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Jul 2020 10:54:57 -0700 (PDT) Sender: Mark Johnston Date: Fri, 10 Jul 2020 13:54:52 -0400 From: Mark Johnston To: Mateusz Guzik Cc: Alan Somers , FreeBSD Hackers Subject: Re: Right-sizing the geli thread pool Message-ID: <20200710175452.GA9380@raichu> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4B3LKZ3mVmz4vqP X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=AEGXzCD2; dmarc=none; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::f44 as permitted sender) smtp.mailfrom=markjdb@gmail.com X-Spamd-Result: default: False [-1.62 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; NEURAL_HAM_MEDIUM(-0.75)[-0.755]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; NEURAL_HAM_LONG(-1.00)[-0.999]; MIME_GOOD(-0.10)[text/plain]; MID_RHS_NOT_FQDN(0.50)[]; DMARC_NA(0.00)[freebsd.org]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; NEURAL_HAM_SHORT(-0.17)[-0.167]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::f44:from]; FREEMAIL_TO(0.00)[gmail.com]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; RECEIVED_SPAMHAUS_PBL(0.00)[174.91.9.204:received]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jul 2020 17:54:59 -0000 On Fri, Jul 10, 2020 at 05:55:50AM +0200, Mateusz Guzik wrote: > On 7/9/20, Alan Somers wrote: > > Currently, geli creates a separate thread pool for each provider, and by > > default each thread pool contains one thread per cpu. On a large server > > with many encrypted disks, that can balloon into a very large number of > > threads! I have a patch in progress that switches from per-provider thread > > pools to a single thread pool for the entire module. Happily, I see read > > IOPs increase by up to 60%. But to my surprise, write IOPs _decreases_ by > > up to 25%. dtrace suggests that the CPU usage is dominated by the > > vmem_free call in biodone, as in the below stack. > > > > kernel`lock_delay+0x32 > > kernel`biodone+0x88 > > kernel`g_io_deliver+0x214 > > geom_eli.ko`g_eli_write_done+0xf6 > > kernel`g_io_deliver+0x214 > > kernel`md_kthread+0x275 > > kernel`fork_exit+0x7e > > kernel`0xffffffff8104784e > > > > I only have one idea for how to improve things from here. The geli thread > > pool is still fed by a single global bio queue. That could cause cache > > thrashing, if bios get moved between cores too often. I think a superior > > design would be to use a separate bio queue for each geli thread, and use > > work-stealing to balance them. However, > > > > 1) That doesn't explain why this change benefits reads more than writes, > > and > > 2) work-stealing is hard to get right, and I can't find any examples in the > > kernel. > > > > Can anybody offer tips or code for implementing work stealing? Or any > > other suggestions about why my write performance is suffering? I would > > like to get this change committed, but not without resolving that issue. > > > > I can't comment on revamping the design, but: > > void > vmem_free(vmem_t *vm, vmem_addr_t addr, vmem_size_t size) > { > qcache_t *qc; > MPASS(size > 0); > > if (size <= vm->vm_qcache_max && > __predict_true(addr >= VMEM_ADDR_QCACHE_MIN)) { > qc = &vm->vm_qcache[(size - 1) >> vm->vm_quantum_shift]; > uma_zfree(qc->qc_cache, (void *)addr); > } else > vmem_xfree(vm, addr, size); > } > > What sizes are being passed here? Or more to the point, is it feasible > to bump qcache to stick to uma in this call? If lock contention is > indeed coming from vmem_xfree this change would get rid of the problem > without having to rework anything. We would have to enable the quantum cache in the transient KVA arena. This itself should not have many downsides on platforms with plenty of KVA, but it only solves the immediate problem: before freeing the KVA biodone() has to perform a global TLB shootdown, and the quantum cache doesn't help at all with that. kib's suggestion of using sf_buf(9) to transiently map crypto(9) payloads in software crypto drivers that require a mapping, and using unmapped cryptop requests for hardware drivers that do not, sounds like the right solution. cryptop structures can already handle multiple data container types, like uios and mbufs, so it should be possible to also support vm_page arrays in OCF like we do for unmapped BIOs, and let crypto(9) drivers create transient mappings when necessary. > For read performance, while it is nice there is a win, it may still be > less than it should. I think it is prudent to get a flamegraph from > both cases. > > -- > Mateusz Guzik > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"