From owner-freebsd-net@freebsd.org Tue Apr 7 23:23:56 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 2F1EA2A260D for ; Tue, 7 Apr 2020 23:23:56 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48xk4W0rm0z4Nsp; Tue, 7 Apr 2020 23:23:54 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk1-x741.google.com with SMTP id i186so1351161qke.1; Tue, 07 Apr 2020 16:23:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=CItCDAt+A6Mb0edkAfeYbxBpgxVn4xq4v1V9FoPrY4Y=; b=SL/MeVrlOfCCp8V4IkyV0drAU+b5GSrDA60MGHZMufCHlKKNmidh+u2mDMrnAbNeDu 1UmnqYveF7/L0O+altpwdDissQro6fM3m4HAfKLi4BfoBhpUIbDZKp6SJ2k4j7yl7ivw YANjIoar4S6zPyAtP6UN1juUcpfBaw+n2YeGrEoy25oo9y2lknfGw3EiLUh8gx/EjfET WAFpu0HpfV7b9KZX+4aSCbm50YcE9nNXdyOV0WFxf5p6lHM7xEPMct+w37yzdrpG7t9z COwcbFrJcXrbFpPQpAJzRPVwvTvsCmLwuNzu4DfvZe01d4JteNzn4kgPbOIOOVHpFrMY SZFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=CItCDAt+A6Mb0edkAfeYbxBpgxVn4xq4v1V9FoPrY4Y=; b=NjIznFIOVhKMbVFdW6Ih00ksk3vTtn0tfYIcQdq9tFDdtcgcYSOGnh911mHVaQoIkV 6QwQLDXpFxv18YwsNgS2jGt7sWQDIrkIqo3cwXikD7qUAKoQAPNAGpjh5thGD8SocAYR f6g1rTbo7WnvGXPT48W4VaeU9itwu5Xte7sQnjh6UC5UVzlUnqPSTMbzpyPbU8kpJwNq M2tCOCjLsAM2UNacb6U6ArGmtgy5gylhOW8pQfeng7gwCqkMHh6WwPywjnz8y0ys1gz9 0ObtLx4DrTgxqEpT+MpgfSbpRy/szxrcizT11LY/RnV7X/3o27Z9KAVTDS6Gww6mlONx vHEg== X-Gm-Message-State: AGi0PuZ+8iY2CHxbqfVZ5YHYq+KTCWGHa4nurx3ZlFzNnR/o8ylUnXeU Av3UxfNza+yS+7EeXLOvHR2Rhbd9 X-Google-Smtp-Source: APiQypLcel3QNAYxx3HYLoQ9mRzRVEF+fSWn+5hce0YpZc7rVMP/RF3D5qlIY0AeuDHQLjCQHQxIFQ== X-Received: by 2002:a37:9e17:: with SMTP id h23mr4697146qke.315.1586301833488; Tue, 07 Apr 2020 16:23:53 -0700 (PDT) Received: from raichu (toroon0560w-lp130-10-174-94-17-182.dsl.bell.ca. [174.94.17.182]) by smtp.gmail.com with ESMTPSA id b195sm1777201qkg.108.2020.04.07.16.23.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Apr 2020 16:23:52 -0700 (PDT) Sender: Mark Johnston Date: Tue, 7 Apr 2020 19:23:47 -0400 From: Mark Johnston To: Eric Joyner Cc: Hans Petter Selasky , freebsd-net@freebsd.org, shurd , John Baldwin , Drew Gallatin Subject: Re: Issue with epoch_drain_callbacks and unloading iavf(4) [using iflib] Message-ID: <20200407232347.GA5605@raichu> References: <20200212222219.GE83892@raichu> <20200328225150.GA82767@raichu> <20200331192024.GE97238@raichu> <20200406212903.GA55712@raichu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 48xk4W0rm0z4Nsp X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=SL/MeVrl; dmarc=none; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::741 as permitted sender) smtp.mailfrom=markjdb@gmail.com X-Spamd-Result: default: False [-1.84 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; NEURAL_HAM_MEDIUM(-0.99)[-0.986,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; NEURAL_HAM_LONG(-1.00)[-0.998,0]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[freebsd.org]; RCPT_COUNT_FIVE(0.00)[6]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCVD_IN_DNSWL_NONE(0.00)[1.4.7.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; IP_SCORE(-0.16)[ip: (0.01), ipnet: 2607:f8b0::/32(-0.33), asn: 15169(-0.43), country: US(-0.05)]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; MID_RHS_NOT_FQDN(0.50)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2020 23:23:56 -0000 On Mon, Apr 06, 2020 at 02:34:50PM -0700, Eric Joyner wrote: > On Mon, Apr 6, 2020 at 2:29 PM Mark Johnston wrote: > > > On Mon, Apr 06, 2020 at 02:19:25PM -0700, Eric Joyner wrote: > > > Mark, > > > > > > I think I was mistaken about the backtrace looking the same. I was > > looking > > > at it from within ddb, and I think I focused on the > > > epoch_block_handler_preempt line and didn't notice that it only stopped > > > there this time. Here's the new one I've got from kgdb: > > > > Thanks. Could you try to print "td->td_name" from frame 4? It should > > also be available as er->er_blockedtd. Basically, I'm trying to verify > > that the interrupt thread itself isn't the one that we're waiting for, > > else there is another bug to be fixed. > > > > If you can provide kernel symbols and vmcore, I'd be happy to look at it > > myself. > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > > Here's what I get: > > (kgdb) frame 4 > #4 epoch_block_handler_preempt (global=0xfffff80003de0100, > cr=0xfffffe00dee85900, arg=0x0) at /usr/src/sys/kern/subr_epoch.c:507 > 507 } > (kgdb) print td->td_name > $1 = "if_io_tqg_31\000\000\000\000\000\000\000" > (kgdb) print er->er_blockedtd > $2 = (struct thread *) 0x0 I spent some time looking at the core. It looks like we have yet another problem: the gtaskqueue code won't exit the net epoch if it is constantly running a net task. Could you please retry with the patches from before, and this one included? diff --git a/sys/kern/subr_gtaskqueue.c b/sys/kern/subr_gtaskqueue.c index f52f32204644..2b1386a612ee 100644 --- a/sys/kern/subr_gtaskqueue.c +++ b/sys/kern/subr_gtaskqueue.c @@ -345,7 +345,7 @@ gtaskqueue_run_locked(struct gtaskqueue *queue) struct epoch_tracker et; struct gtaskqueue_busy tb; struct gtask *gtask; - bool in_net_epoch; + bool in net_epoch; KASSERT(queue != NULL, ("tq is NULL")); TQ_ASSERT_LOCKED(queue); @@ -361,20 +361,19 @@ gtaskqueue_run_locked(struct gtaskqueue *queue) TQ_UNLOCK(queue); KASSERT(gtask->ta_func != NULL, ("task->ta_func is NULL")); - if (!in_net_epoch && TASK_IS_NET(gtask)) { - in_net_epoch = true; + if (TASK_IS_NET(gtask)) { NET_EPOCH_ENTER(et); - } else if (in_net_epoch && !TASK_IS_NET(gtask)) { + in_net_epoch = true; + } + gtask->ta_func(gtask->ta_context); + if (in_net_epoch) { NET_EPOCH_EXIT(et); in_net_epoch = false; } - gtask->ta_func(gtask->ta_context); TQ_LOCK(queue); wakeup(gtask); } - if (in_net_epoch) - NET_EPOCH_EXIT(et); LIST_REMOVE(&tb, tb_link); }