From owner-freebsd-current@freebsd.org Mon Sep 16 06:33:01 2019 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id F2852F525E; Mon, 16 Sep 2019 06:33:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 46WxJD2XL7z4D2N; Mon, 16 Sep 2019 06:32:59 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x8G6Wqr5025904 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Mon, 16 Sep 2019 09:32:55 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x8G6Wqr5025904 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x8G6WqOm025903; Mon, 16 Sep 2019 09:32:52 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 16 Sep 2019 09:32:52 +0300 From: Konstantin Belousov To: Peter Jeremy Cc: freebsd-current@FreeBSD.org, freebsd-arm@FreeBSD.org, rmacklem@freebsd.org Subject: Re: "Sleeping with non-sleepable lock" in NFS on recent -current Message-ID: <20190916063252.GS2559@kib.kiev.ua> References: <20190916061205.GE97181@server.rulingia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190916061205.GE97181@server.rulingia.com> User-Agent: Mutt/1.12.1 (2019-06-15) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-Rspamd-Queue-Id: 46WxJD2XL7z4D2N X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=gmail.com (policy=none); spf=softfail (mx1.freebsd.org: 2001:470:d5e7:1::1 is neither permitted nor denied by domain of kostikbel@gmail.com) smtp.mailfrom=kostikbel@gmail.com X-Spamd-Result: default: False [-2.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; DMARC_POLICY_SOFTFAIL(0.10)[gmail.com : No valid SPF, No valid DKIM,none]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; FREEMAIL_FROM(0.00)[gmail.com]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; R_SPF_SOFTFAIL(0.00)[~all]; IP_SCORE_FREEMAIL(0.00)[]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; IP_SCORE(0.00)[ip: (-2.60), ipnet: 2001:470::/32(-4.46), asn: 6939(-3.21), country: US(-0.05)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Sep 2019 06:33:02 -0000 On Mon, Sep 16, 2019 at 04:12:05PM +1000, Peter Jeremy wrote: > I'm consistently seeing panics in the NFS code on recent -current on aarm64. > The panics are one of the following two: > Sleeping on "vmopar" with the following non-sleepable locks held: > exclusive sleep mutex NEWNFSnode lock (NEWNFSnode lock) r = 0 (0xfffffd0078b346f0) locked @ /usr/src/sys/fs/nfsclient/nfs_clport.c:432 > > Sleeping thread (tid 100077, pid 35) owns a non-sleepable lock > > Both panics have nearly identical backtraces (see below). I'm running > diskless on a Rock64 with both filesystem and swap over NFS. The panics > can be fairly reliably triggered by any of: > * "make -j4 buildworld" > * linking the kernel (as part of buildkernel) > * "make installworld" > > Has anyone else seen this? > > The first panic (sleeping on vmopar) has a backtrace: > sched_switch() at mi_switch+0x19c > pc = 0xffff0000002ab368 lr = 0xffff00000028a9f4 > sp = 0xffff000061192660 fp = 0xffff000061192680 > > mi_switch() at sleepq_switch+0x100 > pc = 0xffff00000028a9f4 lr = 0xffff0000002d56dc > sp = 0xffff000061192690 fp = 0xffff0000611926d0 > > sleepq_switch() at sleepq_wait+0x48 > pc = 0xffff0000002d56dc lr = 0xffff0000002d5594 > sp = 0xffff0000611926e0 fp = 0xffff000061192700 > > sleepq_wait() at _sleep+0x2c4 [***] > pc = 0xffff0000002d5594 lr = 0xffff000000289eec > sp = 0xffff000061192710 fp = 0xffff0000611927b0 > > _sleep() at vm_object_page_remove+0x178 [***] > pc = 0xffff000000289eec lr = 0xffff00000052211c > sp = 0xffff0000611927c0 fp = 0xffff000061192820 > > vm_object_page_remove() at vnode_pager_setsize+0xc0 > pc = 0xffff00000052211c lr = 0xffff000000539a70 > sp = 0xffff000061192830 fp = 0xffff000061192870 > > vnode_pager_setsize() at nfscl_loadattrcache+0x2e8 > pc = 0xffff000000539a70 lr = 0xffff0000001ed4b4 > sp = 0xffff000061192880 fp = 0xffff0000611928e0 > > nfscl_loadattrcache() at ncl_writerpc+0x104 > pc = 0xffff0000001ed4b4 lr = 0xffff0000001e2158 > sp = 0xffff0000611928f0 fp = 0xffff000061192a40 > > ncl_writerpc() at ncl_doio+0x36c > pc = 0xffff0000001e2158 lr = 0xffff0000001f0370 > sp = 0xffff000061192a50 fp = 0xffff000061192ae0 > > ncl_doio() at nfssvc_iod+0x228 > pc = 0xffff0000001f0370 lr = 0xffff0000001f1d88 > sp = 0xffff000061192af0 fp = 0xffff000061192b50 > > nfssvc_iod() at fork_exit+0x7c > pc = 0xffff0000001f1d88 lr = 0xffff00000023ff5c > sp = 0xffff000061192b60 fp = 0xffff000061192b90 > > fork_exit() at fork_trampoline+0x10 > pc = 0xffff00000023ff5c lr = 0xffff000000562c34 > sp = 0xffff000061192ba0 fp = 0x0000000000000000 > > > For the second panic, the [***] change to: > sleepq_wait() at vm_page_sleep_if_busy+0x80 > vm_page_sleep_if_busy() at vm_object_page_remove+0xfc Weird since this should have been fixed long time ago. Anyway, please try the following, it should fix the rest of cases. diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c index 471e029a8b5..098de1ced80 100644 --- a/sys/fs/nfsclient/nfs_clport.c +++ b/sys/fs/nfsclient/nfs_clport.c @@ -511,10 +511,10 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, * zero np->n_attrstamp to indicate that * the attributes are stale. */ - vap->va_size = np->n_size; + nsize = vap->va_size = np->n_size; + setnsize = 1; np->n_attrstamp = 0; KDTRACE_NFS_ATTRCACHE_FLUSH_DONE(vp); - vnode_pager_setsize(vp, np->n_size); } else if (np->n_flag & NMODIFIED) { /* * We've modified the file: Use the larger @@ -526,7 +526,8 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, np->n_size = vap->va_size; np->n_flag |= NSIZECHANGED; } - vnode_pager_setsize(vp, np->n_size); + nsize = np->n_size; + setnsize = 1; } else if (vap->va_size < np->n_size) { /* * When shrinking the size, the call to @@ -540,7 +541,7 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, } else { np->n_size = vap->va_size; np->n_flag |= NSIZECHANGED; - vnode_pager_setsize(vp, np->n_size); + setnsize = 1; } } else { np->n_size = vap->va_size;