From owner-freebsd-net@freebsd.org Wed Apr 8 01:05:01 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id D1FCA2A4E23 for ; Wed, 8 Apr 2020 01:05:01 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48xmK86HjJz4TZg; Wed, 8 Apr 2020 01:05:00 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2020.home.selasky.org (unknown [62.141.129.235]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 3E217260277; Wed, 8 Apr 2020 03:04:58 +0200 (CEST) Subject: Re: Issue with epoch_drain_callbacks and unloading iavf(4) [using iflib] To: Mark Johnston , Eric Joyner Cc: freebsd-net@freebsd.org, shurd , John Baldwin , Drew Gallatin References: <20200212222219.GE83892@raichu> <20200328225150.GA82767@raichu> <20200331192024.GE97238@raichu> <20200406212903.GA55712@raichu> <20200407232347.GA5605@raichu> From: Hans Petter Selasky Message-ID: <2220ccb5-7c32-bf90-b50c-42f60fb94ace@selasky.org> Date: Wed, 8 Apr 2020 03:03:24 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: <20200407232347.GA5605@raichu> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 48xmK86HjJz4TZg X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 88.99.82.50 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-5.42 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.998,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[selasky.org]; RCPT_COUNT_FIVE(0.00)[6]; TO_MATCH_ENVRCPT_SOME(0.00)[]; IP_SCORE(-3.12)[ip: (-9.30), ipnet: 88.99.0.0/16(-4.71), asn: 24940(-1.58), country: DE(-0.02)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2020 01:05:01 -0000 On 2020-04-08 01:23, Mark Johnston wrote: > On Mon, Apr 06, 2020 at 02:34:50PM -0700, Eric Joyner wrote: >> On Mon, Apr 6, 2020 at 2:29 PM Mark Johnston wrote: >> >>> On Mon, Apr 06, 2020 at 02:19:25PM -0700, Eric Joyner wrote: >>>> Mark, >>>> >>>> I think I was mistaken about the backtrace looking the same. I was >>> looking >>>> at it from within ddb, and I think I focused on the >>>> epoch_block_handler_preempt line and didn't notice that it only stopped >>>> there this time. Here's the new one I've got from kgdb: >>> >>> Thanks. Could you try to print "td->td_name" from frame 4? It should >>> also be available as er->er_blockedtd. Basically, I'm trying to verify >>> that the interrupt thread itself isn't the one that we're waiting for, >>> else there is another bug to be fixed. >>> >>> If you can provide kernel symbols and vmcore, I'd be happy to look at it >>> myself. >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>> >> >> Here's what I get: >> >> (kgdb) frame 4 >> #4 epoch_block_handler_preempt (global=0xfffff80003de0100, >> cr=0xfffffe00dee85900, arg=0x0) at /usr/src/sys/kern/subr_epoch.c:507 >> 507 } >> (kgdb) print td->td_name >> $1 = "if_io_tqg_31\000\000\000\000\000\000\000" >> (kgdb) print er->er_blockedtd >> $2 = (struct thread *) 0x0 > > I spent some time looking at the core. It looks like we have yet > another problem: the gtaskqueue code won't exit the net epoch if it is > constantly running a net task. Could you please retry with the patches > from before, and this one included? > There is the same issue in kern_intr.c (FYI). --HPS