From owner-freebsd-net@freebsd.org Wed Feb 12 22:22:25 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 75B0A244D32 for ; Wed, 12 Feb 2020 22:22:25 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48HvJw3M6gz45JT; Wed, 12 Feb 2020 22:22:24 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk1-x744.google.com with SMTP id c20so3776326qkm.1; Wed, 12 Feb 2020 14:22:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=6gvFnW5SbLBlwPc1HriGkRRHM4yDKHi1kEGraoNuj3Y=; b=O5k8zlDydIu2Xt8jVaYyfDcvpFiZApCEr4lvaJO06Vi+vdZj/qVJDTmaE7rN4CFetX GFW9qiM2BDJx9CNQNjD3BvdMrNiHofulxfea+xTZR6WcKZsS60GXlON0y4RP22V9kTnV GqBoWgcp1VorLkoErnIQHwMR2xLotC7V/vv85Bbxd4A5rY71dtGvZmGWF13kpGUHv61G OSnLz2LtDKfSjgHzkh5QHUukP23SGyslsgQzxoOHYMmby7dP8ny/dfZxcBqRVv+ak2mb UB/DRYK4sB35OokHTvvqH+uKRR/iiMuau6bW/gusGgmwbVqq5tDWZcUdnpARJWJxVyB2 IMvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=6gvFnW5SbLBlwPc1HriGkRRHM4yDKHi1kEGraoNuj3Y=; b=TRDQPybnhjyQhb5nzeQjFWazVbox86NhX3Kn+zgxKhO+1cQ9KIKBStUTZef36R8FdT MYToCDfx9g1GQ+waI6e9F6Y0HV6yD5qUtIygF4jX/1rETjHTwSO2kFO4W+PzpJGa1/a4 BX6dywBMdeI299CsBCO6uUFjmqRvz0Spk5gtSgA6k2MeFI48lzGAoYRtGH8uCV02N+AX zUkyHa9yqhOf/wTAzyXMo5Ws11jBqMz+66OTstPleDOsCQ2Bg8W4JTSE3qK4iFsDV1q2 qvPYmxvA4LMhVz3BPfBYstyfSvmfmjDPRg13CDc9SE/6BJl8s0FFGOAsXIBSpSV59U43 VOUA== X-Gm-Message-State: APjAAAVHVws3UQsX4vkCCS/y0Ika88DdGLUd+tiBdA7c9l3THlbt7UWH 5f2oVzCZozjtcnTvGLE5gNkYOcnHl+I= X-Google-Smtp-Source: APXvYqxswt2r08oxrLW9yMwDB0rQbRaN7+UFx4lzPFcNHVYp6FTocDhHKddWfqJubdQejvoMvL3wIg== X-Received: by 2002:a37:4e94:: with SMTP id c142mr11831955qkb.13.1581546142908; Wed, 12 Feb 2020 14:22:22 -0800 (PST) Received: from raichu (toroon0560w-lp130-11-70-50-21-248.dsl.bell.ca. [70.50.21.248]) by smtp.gmail.com with ESMTPSA id 17sm177307qkh.29.2020.02.12.14.22.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Feb 2020 14:22:22 -0800 (PST) Sender: Mark Johnston Date: Wed, 12 Feb 2020 17:22:19 -0500 From: Mark Johnston To: Eric Joyner Cc: freebsd-net@freebsd.org, Hans Petter Selasky , John Baldwin , shurd , Drew Gallatin , Gleb Smirnoff Subject: Re: Issue with epoch_drain_callbacks and unloading iavf(4) [using iflib] Message-ID: <20200212222219.GE83892@raichu> References: <0e2e97f2-df75-3c6f-9bdd-e8c2ab7bf79e@selasky.org> <20200130030911.GA15281@spy> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 48HvJw3M6gz45JT X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=O5k8zlDy; dmarc=none; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::744 as permitted sender) smtp.mailfrom=markjdb@gmail.com X-Spamd-Result: default: False [-2.08 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; NEURAL_HAM_MEDIUM(-0.99)[-0.995,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; NEURAL_HAM_LONG(-1.00)[-0.999,0]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[freebsd.org]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCVD_IN_DNSWL_NONE(0.00)[4.4.7.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; RCPT_COUNT_SEVEN(0.00)[7]; IP_SCORE(-0.38)[ip: (1.75), ipnet: 2607:f8b0::/32(-1.92), asn: 15169(-1.70), country: US(-0.05)]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; MID_RHS_NOT_FQDN(0.50)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Feb 2020 22:22:25 -0000 On Mon, Feb 10, 2020 at 01:30:47PM -0800, Eric Joyner wrote: > On Fri, Jan 31, 2020 at 11:44 AM Eric Joyner wrote: > > Hi Mark, > > > > I applied your patch to a FreeBSD-current VM and reran the test I was > > running originally, and I still see the same behavior. > > > > Your patch doesn't appear to fix the problem. > > > > - Eric > > > > *bump* Sorry, I had missed this. I think this implies that even the epoch callback tasks are not getting a chance to run, but I guess that makes sense: all grouptask threads run at PI_SOFT, a realtime priority, so the iflib and epoch callback task threads are all running at the same priority. Unless iflib voluntarily yields the CPU it'll keep getting scheduled. Here's an updated batch which also allows the priority of group task threads to be specified. I don't think it makes much sense to have more than one per-CPU group task in the system without that parameter. With this, epoch callbacks should run at a slightly higher priority than iflib. The combined patch also fixes an annoying problem where interface destruction can appear to hang for a short period (~0.5s). This is because epoch callbacks are scheduled off hardclock, but hardclock interrupts run at a very low frequency on an idle CPU. diff --git a/sys/kern/subr_epoch.c b/sys/kern/subr_epoch.c index 0a477e1d6f7b..4ed7e7e11a3e 100644 --- a/sys/kern/subr_epoch.c +++ b/sys/kern/subr_epoch.c @@ -74,15 +74,19 @@ typedef struct epoch_record { volatile struct epoch_tdlist er_tdlist; volatile uint32_t er_gen; uint32_t er_cpuid; + int er_drain_state; } __aligned(EPOCH_ALIGN) *epoch_record_t; +#define EPOCH_DRAIN_START 2 +#define EPOCH_DRAIN_RUNNING 1 +#define EPOCH_DRAIN_DONE 0 + struct epoch { struct ck_epoch e_epoch __aligned(EPOCH_ALIGN); epoch_record_t e_pcpu_record; int e_idx; int e_flags; struct sx e_drain_sx; - struct mtx e_drain_mtx; volatile int e_drain_count; const char *e_name; }; @@ -335,7 +339,6 @@ epoch_alloc(const char *name, int flags) epoch->e_idx = epoch_count; epoch->e_name = name; sx_init(&epoch->e_drain_sx, "epoch-drain-sx"); - mtx_init(&epoch->e_drain_mtx, "epoch-drain-mtx", NULL, MTX_DEF); allepochs[epoch_count++] = epoch; return (epoch); } @@ -348,7 +351,6 @@ epoch_free(epoch_t epoch) allepochs[epoch->e_idx] = NULL; epoch_wait(global_epoch); uma_zfree_pcpu(pcpu_zone_record, epoch->e_pcpu_record); - mtx_destroy(&epoch->e_drain_mtx); sx_destroy(&epoch->e_drain_sx); free(epoch, M_EPOCH); } @@ -699,14 +701,24 @@ epoch_call_task(void *arg __unused) epoch_t epoch; ck_stack_t cb_stack; int i, npending, total; + bool draining; + + KASSERT(curthread->td_pinned > 0, + ("%s: callback task thread is not pinned", __func__)); ck_stack_init(&cb_stack); critical_enter(); epoch_enter(global_epoch); - for (total = i = 0; i < epoch_count; i++) { + for (total = i = 0, draining = false; i < epoch_count; i++) { if (__predict_false((epoch = allepochs[i]) == NULL)) continue; er = epoch_currecord(epoch); + if (atomic_load_int(&er->er_drain_state) == EPOCH_DRAIN_START) { + atomic_store_int(&er->er_drain_state, + EPOCH_DRAIN_RUNNING); + draining = true; + } + record = &er->er_record; if ((npending = record->n_pending) == 0) continue; @@ -728,6 +740,20 @@ epoch_call_task(void *arg __unused) next = CK_STACK_NEXT(cursor); entry->function(entry); } + + if (__predict_false(draining)) { + epoch_enter(global_epoch); + for (i = 0; i < epoch_count; i++) { + if (__predict_false((epoch = allepochs[i]) == NULL)) + continue; + er = epoch_currecord(epoch); + if (atomic_load_int(&er->er_drain_state) == + EPOCH_DRAIN_RUNNING) + atomic_store_int(&er->er_drain_state, + EPOCH_DRAIN_DONE); + } + epoch_exit(global_epoch); + } } int @@ -769,27 +795,18 @@ in_epoch(epoch_t epoch) } static void -epoch_drain_cb(struct epoch_context *ctx) +epoch_drain_handler(struct ck_epoch *global __unused, + ck_epoch_record_t *cr __unused, void *arg __unused) { - struct epoch *epoch = - __containerof(ctx, struct epoch_record, er_drain_ctx)->er_parent; - - if (atomic_fetchadd_int(&epoch->e_drain_count, -1) == 1) { - mtx_lock(&epoch->e_drain_mtx); - wakeup(epoch); - mtx_unlock(&epoch->e_drain_mtx); - } + maybe_yield(); } void epoch_drain_callbacks(epoch_t epoch) { epoch_record_t er; - struct thread *td; - int was_bound; - int old_pinned; - int old_cpu; - int cpu; + int cpu, state; + bool pending; WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "epoch_drain_callbacks() may sleep!"); @@ -802,45 +819,28 @@ epoch_drain_callbacks(epoch_t epoch) return; #endif DROP_GIANT(); - sx_xlock(&epoch->e_drain_sx); - mtx_lock(&epoch->e_drain_mtx); - td = curthread; - thread_lock(td); - old_cpu = PCPU_GET(cpuid); - old_pinned = td->td_pinned; - was_bound = sched_is_bound(td); - sched_unbind(td); - td->td_pinned = 0; + /* Make sure that all pending callbacks are available. */ + ck_epoch_synchronize_wait(&epoch->e_epoch, epoch_drain_handler, NULL); - CPU_FOREACH(cpu) - epoch->e_drain_count++; CPU_FOREACH(cpu) { er = zpcpu_get_cpu(epoch->e_pcpu_record, cpu); - sched_bind(td, cpu); - epoch_call(epoch, &epoch_drain_cb, &er->er_drain_ctx); + atomic_store_int(&er->er_drain_state, EPOCH_DRAIN_START); + GROUPTASK_ENQUEUE(DPCPU_ID_PTR(cpu, epoch_cb_task)); } - /* restore CPU binding, if any */ - if (was_bound != 0) { - sched_bind(td, old_cpu); - } else { - /* get thread back to initial CPU, if any */ - if (old_pinned != 0) - sched_bind(td, old_cpu); - sched_unbind(td); - } - /* restore pinned after bind */ - td->td_pinned = old_pinned; - - thread_unlock(td); - - while (epoch->e_drain_count != 0) - msleep(epoch, &epoch->e_drain_mtx, PZERO, "EDRAIN", 0); + do { + pending = false; + CPU_FOREACH(cpu) { + er = zpcpu_get_cpu(epoch->e_pcpu_record, cpu); + state = atomic_load_int(&er->er_drain_state); + if (state != EPOCH_DRAIN_DONE) + pending = true; + } + pause("edrain", 1); + } while (pending); - mtx_unlock(&epoch->e_drain_mtx); sx_xunlock(&epoch->e_drain_sx); - PICKUP_GIANT(); } diff --git a/sys/kern/subr_gtaskqueue.c b/sys/kern/subr_gtaskqueue.c index 9bf4b0ca3ad3..f2e7f74325fa 100644 --- a/sys/kern/subr_gtaskqueue.c +++ b/sys/kern/subr_gtaskqueue.c @@ -54,8 +54,8 @@ static void gtaskqueue_thread_loop(void *arg); static int task_is_running(struct gtaskqueue *queue, struct gtask *gtask); static void gtaskqueue_drain_locked(struct gtaskqueue *queue, struct gtask *gtask); -TASKQGROUP_DEFINE(softirq, mp_ncpus, 1); -TASKQGROUP_DEFINE(config, 1, 1); +TASKQGROUP_DEFINE(softirq, mp_ncpus, 1, PI_SOFT); +TASKQGROUP_DEFINE(config, 1, 1, PI_SOFT); struct gtaskqueue_busy { struct gtask *tb_running; @@ -612,7 +612,7 @@ struct taskq_bind_task { }; static void -taskqgroup_cpu_create(struct taskqgroup *qgroup, int idx, int cpu) +taskqgroup_cpu_create(struct taskqgroup *qgroup, int idx, int cpu, int pri) { struct taskqgroup_cpu *qcpu; @@ -620,7 +620,7 @@ taskqgroup_cpu_create(struct taskqgroup *qgroup, int idx, int cpu) LIST_INIT(&qcpu->tgc_tasks); qcpu->tgc_taskq = gtaskqueue_create_fast(NULL, M_WAITOK, taskqueue_thread_enqueue, &qcpu->tgc_taskq); - gtaskqueue_start_threads(&qcpu->tgc_taskq, 1, PI_SOFT, + gtaskqueue_start_threads(&qcpu->tgc_taskq, 1, pri, "%s_%d", qgroup->tqg_name, idx); qcpu->tgc_cpu = cpu; } @@ -900,7 +900,7 @@ taskqgroup_config_init(void *arg) LIST_SWAP(>ask_head, &qgroup->tqg_queue[0].tgc_tasks, grouptask, gt_list); qgroup->tqg_queue[0].tgc_cnt = 0; - taskqgroup_cpu_create(qgroup, 0, 0); + taskqgroup_cpu_create(qgroup, 0, 0, PI_SOFT); qgroup->tqg_cnt = 1; qgroup->tqg_stride = 1; @@ -910,7 +910,7 @@ SYSINIT(taskqgroup_config_init, SI_SUB_TASKQ, SI_ORDER_SECOND, taskqgroup_config_init, NULL); static int -_taskqgroup_adjust(struct taskqgroup *qgroup, int cnt, int stride) +_taskqgroup_adjust(struct taskqgroup *qgroup, int cnt, int stride, int pri) { LIST_HEAD(, grouptask) gtask_head = LIST_HEAD_INITIALIZER(NULL); struct grouptask *gtask; @@ -948,7 +948,7 @@ _taskqgroup_adjust(struct taskqgroup *qgroup, int cnt, int stride) */ cpu = old_cpu; for (i = old_cnt; i < cnt; i++) { - taskqgroup_cpu_create(qgroup, i, cpu); + taskqgroup_cpu_create(qgroup, i, cpu, pri); for (k = 0; k < stride; k++) cpu = CPU_NEXT(cpu); @@ -1001,12 +1001,12 @@ _taskqgroup_adjust(struct taskqgroup *qgroup, int cnt, int stride) } int -taskqgroup_adjust(struct taskqgroup *qgroup, int cnt, int stride) +taskqgroup_adjust(struct taskqgroup *qgroup, int cnt, int stride, int pri) { int error; mtx_lock(&qgroup->tqg_lock); - error = _taskqgroup_adjust(qgroup, cnt, stride); + error = _taskqgroup_adjust(qgroup, cnt, stride, pri); mtx_unlock(&qgroup->tqg_lock); return (error); diff --git a/sys/net/iflib.c b/sys/net/iflib.c index a9fc8090c48a..1163425bf15a 100644 --- a/sys/net/iflib.c +++ b/sys/net/iflib.c @@ -37,6 +37,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -560,8 +561,8 @@ MODULE_VERSION(iflib, 1); MODULE_DEPEND(iflib, pci, 1, 1, 1); MODULE_DEPEND(iflib, ether, 1, 1, 1); -TASKQGROUP_DEFINE(if_io_tqg, mp_ncpus, 1); -TASKQGROUP_DEFINE(if_config_tqg, 1, 1); +TASKQGROUP_DEFINE(if_io_tqg, mp_ncpus, 1, PI_SWI(SWI_NET)); +TASKQGROUP_DEFINE(if_config_tqg, 1, 1, PI_SWI(SWI_NET)); #ifndef IFLIB_DEBUG_COUNTERS #ifdef INVARIANTS diff --git a/sys/sys/gtaskqueue.h b/sys/sys/gtaskqueue.h index 96fd57dfb76d..a57bcfc1881a 100644 --- a/sys/sys/gtaskqueue.h +++ b/sys/sys/gtaskqueue.h @@ -79,7 +79,8 @@ int taskqgroup_attach_cpu(struct taskqgroup *qgroup, void taskqgroup_detach(struct taskqgroup *qgroup, struct grouptask *gtask); struct taskqgroup *taskqgroup_create(const char *name); void taskqgroup_destroy(struct taskqgroup *qgroup); -int taskqgroup_adjust(struct taskqgroup *qgroup, int cnt, int stride); +int taskqgroup_adjust(struct taskqgroup *qgroup, int cnt, int stride, + int pri); void taskqgroup_config_gtask_init(void *ctx, struct grouptask *gtask, gtask_fn_t *fn, const char *name); void taskqgroup_config_gtask_deinit(struct grouptask *gtask); @@ -100,7 +101,7 @@ void taskqgroup_config_gtask_deinit(struct grouptask *gtask); #define TASKQGROUP_DECLARE(name) \ extern struct taskqgroup *qgroup_##name -#define TASKQGROUP_DEFINE(name, cnt, stride) \ +#define TASKQGROUP_DEFINE(name, cnt, stride, pri) \ \ struct taskqgroup *qgroup_##name; \ \ @@ -116,7 +117,7 @@ SYSINIT(taskqgroup_##name, SI_SUB_TASKQ, SI_ORDER_FIRST, \ static void \ taskqgroup_adjust_##name(void *arg) \ { \ - taskqgroup_adjust(qgroup_##name, (cnt), (stride)); \ + taskqgroup_adjust(qgroup_##name, (cnt), (stride), (pri)); \ } \ \ SYSINIT(taskqgroup_adj_##name, SI_SUB_SMP, SI_ORDER_ANY, \