From owner-dev-commits-src-all@freebsd.org Thu Aug 5 14:22:11 2021 Return-Path: Delivered-To: dev-commits-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C54E1664987; Thu, 5 Aug 2021 14:22:11 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GgW5b5Gjxz3BmC; Thu, 5 Aug 2021 14:22:11 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 9DCA91BA5C; Thu, 5 Aug 2021 14:22:11 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 175EMBER006557; Thu, 5 Aug 2021 14:22:11 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 175EMBpP006556; Thu, 5 Aug 2021 14:22:11 GMT (envelope-from git) Date: Thu, 5 Aug 2021 14:22:11 GMT Message-Id: <202108051422.175EMBpP006556@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Andrew Gallatin Subject: git: 98215005b747 - main - ktls: start a thread to keep the 16k ktls buffer zone populated MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: gallatin X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 98215005b747fef67f44794ca64abd473b98bade Auto-Submitted: auto-generated X-BeenThere: dev-commits-src-all@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Commit messages for all branches of the src repository List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Aug 2021 14:22:11 -0000 The branch main has been updated by gallatin: URL: https://cgit.FreeBSD.org/src/commit/?id=98215005b747fef67f44794ca64abd473b98bade commit 98215005b747fef67f44794ca64abd473b98bade Author: Andrew Gallatin AuthorDate: 2021-08-05 14:15:09 +0000 Commit: Andrew Gallatin CommitDate: 2021-08-05 14:19:12 +0000 ktls: start a thread to keep the 16k ktls buffer zone populated Ktls recently received an optimization where we allocate 16k physically contiguous crypto destination buffers. This provides a large (more than 5%) reduction in CPU use in our workload. However, after several days of uptime, the performance benefit disappears because we have frequent allocation failures from the ktls buffer zone. It turns out that when load drops off, the ktls buffer zone is trimmed, and some 16k buffers are freed back to the OS. When load picks back up again, re-allocating those 16k buffers fails after some number of days of uptime because physical memory has become fragmented. This causes allocations to fail, because they are intentionally done without M_NORECLAIM, so as to avoid pausing the ktls crytpo work thread while the VM system defragments memory. To work around this, this change starts one thread per VM domain to allocate ktls buffers with M_NORECLAIM, as we don't care if this thread is paused while memory is defragged. The thread then frees the buffers back into the ktls buffer zone, thus allowing future allocations to succeed. Note that waking up the thread is intentionally racy, but neither of the races really matter. In the worst case, we could have either spurious wakeups or we could have to wait 1 second until the next rate-limited allocation failure to wake up the thread. This patch has been in use at Netflix on a handful of servers, and seems to fix the issue. Differential Revision: https://reviews.freebsd.org/D31260 Reviewed by: jhb, markj, (jtl, rrs, and dhw reviewed earlier version) Sponsored by: Netflix --- sys/kern/uipc_ktls.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 120 insertions(+), 1 deletion(-) diff --git a/sys/kern/uipc_ktls.c b/sys/kern/uipc_ktls.c index 5f7dde325740..17b87195fc50 100644 --- a/sys/kern/uipc_ktls.c +++ b/sys/kern/uipc_ktls.c @@ -78,6 +78,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include struct ktls_wq { struct mtx mtx; @@ -87,9 +88,17 @@ struct ktls_wq { int lastallocfail; } __aligned(CACHE_LINE_SIZE); +struct ktls_alloc_thread { + uint64_t wakeups; + uint64_t allocs; + struct thread *td; + int running; +}; + struct ktls_domain_info { int count; int cpu[MAXCPU]; + struct ktls_alloc_thread alloc_td; }; struct ktls_domain_info ktls_domains[MAXMEMDOM]; @@ -142,6 +151,11 @@ SYSCTL_BOOL(_kern_ipc_tls, OID_AUTO, sw_buffer_cache, CTLFLAG_RDTUN, &ktls_sw_buffer_cache, 1, "Enable caching of output buffers for SW encryption"); +static int ktls_max_alloc = 128; +SYSCTL_INT(_kern_ipc_tls, OID_AUTO, max_alloc, CTLFLAG_RWTUN, + &ktls_max_alloc, 128, + "Max number of 16k buffers to allocate in thread context"); + static COUNTER_U64_DEFINE_EARLY(ktls_tasks_active); SYSCTL_COUNTER_U64(_kern_ipc_tls, OID_AUTO, tasks_active, CTLFLAG_RD, &ktls_tasks_active, "Number of active tasks"); @@ -278,6 +292,7 @@ static void ktls_cleanup(struct ktls_session *tls); static void ktls_reset_send_tag(void *context, int pending); #endif static void ktls_work_thread(void *ctx); +static void ktls_alloc_thread(void *ctx); #if defined(INET) || defined(INET6) static u_int @@ -418,6 +433,32 @@ ktls_init(void *dummy __unused) ktls_number_threads++; } + /* + * Start an allocation thread per-domain to perform blocking allocations + * of 16k physically contiguous TLS crypto destination buffers. + */ + if (ktls_sw_buffer_cache) { + for (domain = 0; domain < vm_ndomains; domain++) { + if (VM_DOMAIN_EMPTY(domain)) + continue; + if (CPU_EMPTY(&cpuset_domain[domain])) + continue; + error = kproc_kthread_add(ktls_alloc_thread, + &ktls_domains[domain], &ktls_proc, + &ktls_domains[domain].alloc_td.td, + 0, 0, "KTLS", "alloc_%d", domain); + if (error) + panic("Can't add KTLS alloc thread %d error %d", + domain, error); + CPU_COPY(&cpuset_domain[domain], &mask); + error = cpuset_setthread(ktls_domains[domain].alloc_td.td->td_tid, + &mask); + if (error) + panic("Unable to bind KTLS alloc %d error %d", + domain, error); + } + } + /* * If we somehow have an empty domain, fall back to choosing * among all KTLS threads. @@ -1946,6 +1987,7 @@ static void * ktls_buffer_alloc(struct ktls_wq *wq, struct mbuf *m) { void *buf; + int domain, running; if (m->m_epg_npgs <= 2) return (NULL); @@ -1961,8 +2003,23 @@ ktls_buffer_alloc(struct ktls_wq *wq, struct mbuf *m) return (NULL); } buf = uma_zalloc(ktls_buffer_zone, M_NOWAIT | M_NORECLAIM); - if (buf == NULL) + if (buf == NULL) { + domain = PCPU_GET(domain); wq->lastallocfail = ticks; + + /* + * Note that this check is "racy", but the races are + * harmless, and are either a spurious wakeup if + * multiple threads fail allocations before the alloc + * thread wakes, or waiting an extra second in case we + * see an old value of running == true. + */ + if (!VM_DOMAIN_EMPTY(domain)) { + running = atomic_load_int(&ktls_domains[domain].alloc_td.running); + if (!running) + wakeup(&ktls_domains[domain].alloc_td); + } + } return (buf); } @@ -2154,6 +2211,68 @@ ktls_encrypt(struct ktls_wq *wq, struct mbuf *top) CURVNET_RESTORE(); } +static void +ktls_alloc_thread(void *ctx) +{ + struct ktls_domain_info *ktls_domain = ctx; + struct ktls_alloc_thread *sc = &ktls_domain->alloc_td; + void **buf; + struct sysctl_oid *oid; + char name[80]; + int i, nbufs; + + curthread->td_domain.dr_policy = + DOMAINSET_PREF(PCPU_GET(domain)); + snprintf(name, sizeof(name), "domain%d", PCPU_GET(domain)); + if (bootverbose) + printf("Starting KTLS alloc thread for domain %d\n", + PCPU_GET(domain)); + oid = SYSCTL_ADD_NODE(NULL, SYSCTL_STATIC_CHILDREN(_kern_ipc_tls), OID_AUTO, + name, CTLFLAG_RD | CTLFLAG_MPSAFE, NULL, ""); + SYSCTL_ADD_U64(NULL, SYSCTL_CHILDREN(oid), OID_AUTO, "allocs", + CTLFLAG_RD, &sc->allocs, 0, "buffers allocated"); + SYSCTL_ADD_U64(NULL, SYSCTL_CHILDREN(oid), OID_AUTO, "wakeups", + CTLFLAG_RD, &sc->wakeups, 0, "thread wakeups"); + SYSCTL_ADD_INT(NULL, SYSCTL_CHILDREN(oid), OID_AUTO, "running", + CTLFLAG_RD, &sc->running, 0, "thread running"); + + buf = NULL; + nbufs = 0; + for (;;) { + atomic_store_int(&sc->running, 0); + tsleep(sc, PZERO, "waiting for work", 0); + atomic_store_int(&sc->running, 1); + sc->wakeups++; + if (nbufs != ktls_max_alloc) { + free(buf, M_KTLS); + nbufs = atomic_load_int(&ktls_max_alloc); + buf = malloc(sizeof(void *) * nbufs, M_KTLS, + M_WAITOK | M_ZERO); + } + /* + * Below we allocate nbufs with different allocation + * flags than we use when allocating normally during + * encryption in the ktls worker thread. We specify + * M_NORECLAIM in the worker thread. However, we omit + * that flag here and add M_WAITOK so that the VM + * system is permitted to perform expensive work to + * defragment memory. We do this here, as it does not + * matter if this thread blocks. If we block a ktls + * worker thread, we risk developing backlogs of + * buffers to be encrypted, leading to surges of + * traffic and potential NIC output drops. + */ + for (i = 0; i < nbufs; i++) { + buf[i] = uma_zalloc(ktls_buffer_zone, M_WAITOK); + sc->allocs++; + } + for (i = 0; i < nbufs; i++) { + uma_zfree(ktls_buffer_zone, buf[i]); + buf[i] = NULL; + } + } +} + static void ktls_work_thread(void *ctx) {