From owner-freebsd-current@freebsd.org Tue Sep 17 08:07:19 2019 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id E0B2BFB54A for ; Tue, 17 Sep 2019 08:07:19 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 46XbLZ4Fl6z3BqL; Tue, 17 Sep 2019 08:07:18 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x8H8706W083680 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 17 Sep 2019 11:07:03 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x8H8706W083680 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x8H86w4H083679; Tue, 17 Sep 2019 11:06:58 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 17 Sep 2019 11:06:58 +0300 From: Konstantin Belousov To: Masachika ISHIZUKA Cc: freebsd-current@freebsd.org, Rick Macklem , peterj@freebsd.org Subject: Re: panic: sleeping thread on r352386 Message-ID: <20190917080658.GW2559@kib.kiev.ua> References: <20190916.205532.1314832713594158104.ish@amail.plala.or.jp> <16601631.sFgvYJuXru@direwolf.local.> <20190917.144251.30337396601444833.ish@amail.plala.or.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190917.144251.30337396601444833.ish@amail.plala.or.jp> User-Agent: Mutt/1.12.1 (2019-06-15) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-Rspamd-Queue-Id: 46XbLZ4Fl6z3BqL X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=gmail.com (policy=none); spf=softfail (mx1.freebsd.org: 2001:470:d5e7:1::1 is neither permitted nor denied by domain of kostikbel@gmail.com) smtp.mailfrom=kostikbel@gmail.com X-Spamd-Result: default: False [-2.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; DMARC_POLICY_SOFTFAIL(0.10)[gmail.com : No valid SPF, No valid DKIM,none]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; FREEMAIL_FROM(0.00)[gmail.com]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; R_SPF_SOFTFAIL(0.00)[~all:c]; IP_SCORE_FREEMAIL(0.00)[]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; IP_SCORE(0.00)[ip: (-2.69), ipnet: 2001:470::/32(-4.46), asn: 6939(-3.22), country: US(-0.05)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Sep 2019 08:07:19 -0000 On Tue, Sep 17, 2019 at 02:42:51PM +0900, Masachika ISHIZUKA wrote: > >> This panic happens on 1300047 (both r352239 and r352386) with core > >> i5-7500 as follows. This panic dose not happen on r351728 (1300044). > >> (The following lines were typed by hand so they might have some miss > >> typed letters.) > >> > >> == > >> Sleeping thread (tid 100177, pid 1814) owns a non-sleepable lock > >> KDB: stack backtrace of thread 100177: > > > > > > https://svnweb.freebsd.org/base?view=revision&revision=352393 > > Thank you for reply. > > I updated to r352431 and this does not panic. Thank you very much. > But 'make buildworld' fails by segment fault like below. > (buildworld is running over the nfs file system.) > > --- modules-all --- > --- ath_hal_ar5211.ko.debug --- > objcopy --only-keep-debug ath_hal_ar5211.ko.full ath_hal_ar5211.ko.debug > Segmentation fault (core dumped) > *** [ath_hal_ar5211.ko.debug] Error code 139 > make[4]: stopped in /usr/altlocal/freebsd-current/src/sys/modules/ath_hal_ar52111 error > > The position of segment fault is diffrent each time. > The below is output of another 'make buildworld'. > > --- kernel.full --- > Segmentation fault (core dumped) > *** [kernel.full] Error code 139 > make[2]: stopped in /usr/altlocal/freebsd-current/obj/usr/altlocal/freebsd-current/src/amd64.amd64/sys/GENERIC > > /var/log/messages is shown as bellow. > > Sep 17 11:22:56 okra kernel: Failed to fully fault in a core file segment at VA > 0x800a00000 with size 0x163000 to be written at offset 0x84a000 for process nm > Sep 17 11:22:56 okra kernel: pid 53593 (nm), jid 0, uid 16220: exited on signal > 11 (core dumped) > Sep 17 11:22:57 okra kernel: Failed to fully fault in a core file segment at VA > 0x800a00000 with size 0x163000 to be written at offset 0x88b000 for process objcopy > Sep 17 11:22:57 okra kernel: pid 53603 (objcopy), jid 0, uid 16220: exited on signal 11 (core dumped) > > Retry 'make buildworld' > > Sep 17 12:24:05 okra kernel: Failed to fully fault in a core file segment at VA > 0x8002f6000 with size 0x93000 to be written at offset 0x239000 for process nm > Sep 17 12:24:05 okra kernel: pid 96873 (nm), jid 0, uid 16220: exited on signal > 11 (core dumped) > Sep 17 12:24:05 okra kernel: Failed to fully fault in a core file segment at VA > 0x80035f000 with size 0x93000 to be written at offset 0x281000 for process objcopy > Sep 17 12:24:06 okra kernel: pid 96889 (objcopy), jid 0, uid 16220: exited on signal 11 (core dumped) > > Retry 'make buildworld' > > Sep 17 14:01:39 okra kernel: Failed to fully fault in a core file segment at VA > 0x8048da000 with size 0x112000 to be written at offset 0x1a33000 for process ld.lld > Sep 17 14:01:51 okra kernel: Failed to fully fault in a core file segment at VA > 0x8117cc000 with size 0x1e7000 to be written at offset 0xe925000 for process ld.lld > Sep 17 14:01:53 okra kernel: pid 50292 (ld.lld), jid 0, uid 16220: exited on signal 11 (core dumped) > > I can 'make buildworld' successfully on r351728(1300044). Try the following change, which more accurately tries to avoid vnode_pager_setsize(). The real cause requires much more extensive changes. diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c index 63ea4736707..16dc7745c77 100644 --- a/sys/fs/nfsclient/nfs_clport.c +++ b/sys/fs/nfsclient/nfs_clport.c @@ -414,12 +414,11 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, struct nfsnode *np; struct nfsmount *nmp; struct timespec mtime_save; - u_quad_t nsize; - int setnsize, error, force_fid_err; + u_quad_t nsize, osize; + int error, force_fid_err; + bool setnsize; error = 0; - setnsize = 0; - nsize = 0; /* * If v_type == VNON it is a new node, so fill in the v_type, @@ -439,6 +438,7 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, nmp = VFSTONFS(vp->v_mount); vap = &np->n_vattr.na_vattr; mtime_save = vap->va_mtime; + osize = vap->va_size; if (writeattr) { np->n_vattr.na_filerev = nap->na_filerev; np->n_vattr.na_size = nap->na_size; @@ -511,8 +511,7 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, * zero np->n_attrstamp to indicate that * the attributes are stale. */ - nsize = vap->va_size = np->n_size; - setnsize = 1; + vap->va_size = np->n_size; np->n_attrstamp = 0; KDTRACE_NFS_ATTRCACHE_FLUSH_DONE(vp); } else if (np->n_flag & NMODIFIED) { @@ -526,22 +525,9 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, np->n_size = vap->va_size; np->n_flag |= NSIZECHANGED; } - nsize = np->n_size; - setnsize = 1; - } else if (vap->va_size < np->n_size) { - /* - * When shrinking the size, the call to - * vnode_pager_setsize() cannot be done - * with the mutex held, so delay it until - * after the mtx_unlock call. - */ - nsize = np->n_size = vap->va_size; - np->n_flag |= NSIZECHANGED; - setnsize = 1; } else { - nsize = np->n_size = vap->va_size; + np->n_size = vap->va_size; np->n_flag |= NSIZECHANGED; - setnsize = 1; } } else { np->n_size = vap->va_size; @@ -579,6 +565,21 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, if (np->n_attrstamp != 0) KDTRACE_NFS_ATTRCACHE_LOAD_DONE(vp, vap, error); #endif + nsize = vap->va_size; + if (nsize == osize) { + setnsize = false; + } else if (nsize > osize) { + vnode_pager_setsize(vp, nsize); + setnsize = false; + } else { + /* + * When shrinking the size, the call to + * vnode_pager_setsize() cannot be done with the mutex + * held, because we might need to wait for a busy + * page. Delay it until after the node is unlocked. + */ + setnsize = true; + } NFSUNLOCKNODE(np); if (setnsize) vnode_pager_setsize(vp, nsize);