From owner-freebsd-net@freebsd.org Mon Apr 6 21:20:04 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id E667027BF50 for ; Mon, 6 Apr 2020 21:20:04 +0000 (UTC) (envelope-from ricera10@gmail.com) Received: from mail-qk1-f180.google.com (mail-qk1-f180.google.com [209.85.222.180]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48x3N40J7lz4bqJ; Mon, 6 Apr 2020 21:20:03 +0000 (UTC) (envelope-from ricera10@gmail.com) Received: by mail-qk1-f180.google.com with SMTP id v7so17944523qkc.0; Mon, 06 Apr 2020 14:20:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8jOdbtAu6cmEJUYsd9Nrsm2p4ZC32Q2ZCeN77ItEYRw=; b=ToXjctrB8S5jwHWS9nslaRSR46kN6iUlYSC1Vrj/wdtsNm1uzUcQrEfUcLlMCFAj1O 672nOxjQ1mlBTlORrr36sqfAaJj/6qc0uZPU6k022SViDjEEaRT7pBxfeUecNYdDMGTm XQRwZEvxMh8+Uqw1pwx1Tsx0ZDpLnlxMOzOQWnKrfi2F4ElZOUJcBq5J/bSrb4YzXF3k v0+ytLtNCFch6Hl9RBhGVq+HPgkgJhDW4orFbP2n9N3d7qDnJl8D90R5duvW2jmslEFx FKPzUdcVsRgctp2dM793f10aVNfLyKWvit3HO1klFKmCnDjDlu1BzZs3uL15yDh0rssg R/rw== X-Gm-Message-State: AGi0PubVWY212vEcEv/BEF1kB57iycZjz4PmIn2Wf9DUyio2v3qpEeAi 6Qw2WFMMcQRceQ1yN1hmMcvi3ReAkDk= X-Google-Smtp-Source: APiQypIVxpouogPDaapzGvOJ27jsXu4zrXCk/cEL9asGUn9ipovHO6IOOszAJ7d1WJWnYk83+xgbRg== X-Received: by 2002:a05:620a:2231:: with SMTP id n17mr24364336qkh.189.1586208002727; Mon, 06 Apr 2020 14:20:02 -0700 (PDT) Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com. [209.85.160.171]) by smtp.gmail.com with ESMTPSA id x24sm3361671qth.80.2020.04.06.14.20.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 Apr 2020 14:20:02 -0700 (PDT) Received: by mail-qt1-f171.google.com with SMTP id a5so1053907qtw.10; Mon, 06 Apr 2020 14:20:02 -0700 (PDT) X-Received: by 2002:ac8:3659:: with SMTP id n25mr1582232qtb.254.1586208001826; Mon, 06 Apr 2020 14:20:01 -0700 (PDT) MIME-Version: 1.0 References: <20200130030911.GA15281@spy> <20200212222219.GE83892@raichu> <20200328225150.GA82767@raichu> <20200331192024.GE97238@raichu> In-Reply-To: <20200331192024.GE97238@raichu> From: Eric Joyner Date: Mon, 6 Apr 2020 14:19:25 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Issue with epoch_drain_callbacks and unloading iavf(4) [using iflib] To: Mark Johnston Cc: Hans Petter Selasky , John Baldwin , Drew Gallatin , freebsd-net@freebsd.org, shurd X-Rspamd-Queue-Id: 48x3N40J7lz4bqJ X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of ricera10@gmail.com designates 209.85.222.180 as permitted sender) smtp.mailfrom=ricera10@gmail.com X-Spamd-Result: default: False [-1.79 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-0.99)[-0.990,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17:c]; NEURAL_HAM_LONG(-0.99)[-0.994,0]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; MIME_TRACE(0.00)[0:+,1:+,2:~]; DMARC_NA(0.00)[freebsd.org]; URI_COUNT_ODD(1.00)[9]; RCPT_COUNT_FIVE(0.00)[6]; RCVD_COUNT_THREE(0.00)[4]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[180.222.85.209.list.dnswl.org : 127.0.5.0]; IP_SCORE(-0.80)[ip: (-3.14), ipnet: 209.85.128.0/17(-0.40), asn: 15169(-0.43), country: US(-0.05)]; FORGED_SENDER(0.30)[erj@freebsd.org,ricera10@gmail.com]; RWL_MAILSPIKE_POSSIBLE(0.00)[180.222.85.209.rep.mailspike.net : 127.0.0.17]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[erj@freebsd.org,ricera10@gmail.com]; RCVD_TLS_ALL(0.00)[] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2020 21:20:05 -0000 On Tue, Mar 31, 2020 at 12:28 PM Mark Johnston wrote: > On Tue, Mar 31, 2020 at 12:14:20PM -0700, Eric Joyner wrote: > > Mark, > > > > I tried out a kernel with the tip of CURRENT with both D24214 and D24215 > > applied, and I still see the problem. As well, after doing a "sysctl > > debug.kdb.enter=1" and viewing the stack trace there for kldunload, it > > appears to be similar to the one I posted in my last post. > > Can you show it? I don't see how it could be the same, since with the > patch we are no longer calling sched_bind() from the epoch scan call > back. > > > > > - Eric > > > > On Mon, Mar 30, 2020 at 1:19 PM Eric Joyner wrote: > > > > > On Sat, Mar 28, 2020 at 3:52 PM Mark Johnston > wrote: > > > > > >> On Wed, Mar 11, 2020 at 04:32:40PM -0700, Eric Joyner wrote: > > >> > Mark, > > >> > > > >> > I did get some time to get back and retry this; however your second > > >> patch > > >> > still doesn't solve the problem. Looking into it a bit, it looks > like > > >> the > > >> > kldunload process isn't hitting the code you've changed; it's > hanging in > > >> > epoch_wait_preempt() in if_detach_internal(), which is immediately > > >> before > > >> > epoch_drain_callbacks(). > > >> > > > >> > I did a kernel dump while it was hanging, and this is the backtrace > for > > >> the > > >> > kldunload process: > > >> > > >> I see. I think the callback can be made much simpler and avoid the > > >> problematic sched_bind() calls. I wrote a patch that allows waiting > > >> threads to lend scheduling priority to a preempted thread blocked in > an > > >> epoch section, based on some code I wrote to implement preemptible SMR > > >> sections. If waiting for a running thread, the callback just spins. > > >> > > >> This might be enough to solve your problem, I posted the two lightly > > >> tested patches here: > > >> https://reviews.freebsd.org/D24214 > > >> https://reviews.freebsd.org/D24215 > > >> > > >> If we hit a situation where a reader is preempted and then its CPU is > > >> hogged by a high-priority kernel thread, this still won't be enough, > but > > >> I suspect it'll solve your case. Would you be able to test? > > >> > > > > > > Yeah, I'll try them out. > > > > > > - Eric > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > Mark, I think I was mistaken about the backtrace looking the same. I was looking at it from within ddb, and I think I focused on the epoch_block_handler_preempt line and didn't notice that it only stopped there this time. Here's the new one I've got from kgdb: #0 cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1448 #1 0xffffffff80ff2f79 in ipi_nmi_handler () at /usr/src/sys/x86/x86/mp_x86.c:1405 #2 0xffffffff810294a6 in trap (frame=0xfffffe003b9b6f30) at /usr/src/sys/amd64/amd64/trap.c:201 #3 #4 epoch_block_handler_preempt (global=0xfffff80003de0100, cr=0xfffffe00dee85900, arg=0x0) at /usr/src/sys/kern/subr_epoch.c:507 #5 0xffffffff803b576d in epoch_block (global=0xfffff80003de0100, cr=0xfffffe00dee85900, cb=0xffffffff80bcf190 , ct=0x0) at /usr/src/sys/contrib/ck/src/ck_epoch.c:416 #6 ck_epoch_synchronize_wait (global=0xfffff80003de0100, cb=, ct=) at /usr/src/sys/contrib/ck/src/ck_epoch.c:465 #7 0xffffffff80bcf03c in epoch_wait_preempt (epoch=0xfffff80003de0100) at /usr/src/sys/kern/subr_epoch.c:529 #8 0xffffffff80c9410a in if_detach_internal (ifp=0xfffff80067ed4000, vmove=0, ifcp=0x0) at /usr/src/sys/net/if.c:1123 #9 0xffffffff80c93ebd in if_detach (ifp=0xfffff80003de0100) at /usr/src/sys/net/if.c:1063 #10 0xffffffff80cafa56 in iflib_device_deregister (ctx=0xfffff80088c91800) at /usr/src/sys/net/iflib.c:5104 #11 0xffffffff80bc1e2e in DEVICE_DETACH (dev=0xfffff80004706a00) at ./device_if.h:234 #12 device_detach (dev=0xfffff80004706a00) at /usr/src/sys/kern/subr_bus.c:3049 #13 0xffffffff80bc13fd in devclass_driver_deleted (busclass=0xfffff80004352900, dc=0xfffff80004385a00, driver=0xffffffff823329e0 ) at /usr/src/sys/kern/subr_bus.c:1235 #14 0xffffffff80bc12ef in devclass_delete_driver (busclass=0xfffff80004352900, driver=0xffffffff823329e0 ) at /usr/src/sys/kern/subr_bus.c:1310 #15 0xffffffff80bc721c in driver_module_handler (mod=0xfffff80015cd8680, what=1, arg=0xffffffff823329b0 ) at /usr/src/sys/kern/subr_bus.c:5229 #16 0xffffffff80b67b82 in module_unload (mod=0xfffff80015cd8680) at /usr/src/sys/kern/kern_module.c:261 #17 0xffffffff80b5895b in linker_file_unload (file=0xfffff8016da69a00, flags=0) at /usr/src/sys/kern/kern_linker.c:700 #18 0xffffffff80b59dad in kern_kldunload (td=, fileid=5, flags=0) at /usr/src/sys/kern/kern_linker.c:1153 #19 0xffffffff8102aa40 in syscallenter (td=) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:162 #20 amd64_syscall (td=0xfffffe00e839f100, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1161 #21 #22 0x00000008002ddcba in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffe188 - Eric