From owner-freebsd-current Sun Sep 1 1:25:22 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 55DC837B400; Sun, 1 Sep 2002 01:25:19 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 19D8643E42; Sun, 1 Sep 2002 01:25:18 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id IAA18602; Sun, 1 Sep 2002 08:25:02 GMT Date: Sun, 1 Sep 2002 18:32:14 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Don Lewis Cc: kris@obsecurity.org, , Subject: Re: Page faults from bento cluster (Re: Problems reading vmcores) In-Reply-To: <200209010726.g817QEwr067908@gw.catspoiler.org> Message-ID: <20020901181130.D8550-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 1 Sep 2002, Don Lewis wrote: > This code in vflush() bothers me: > > mtx_lock(&mntvnode_mtx); > loop: > for (vp = TAILQ_FIRST(&mp->mnt_nvnodelist); vp; vp = nvp) { > /* > * Make sure this vnode wasn't reclaimed in getnewvnode(). > * Start over if it has (it won't be on the list anymore). > */ > if (vp->v_mount != mp) > goto loop; > nvp = TAILQ_NEXT(vp, v_nmntvnodes); > > mtx_unlock(&mntvnode_mtx); > vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td); > /* > * Skip over a vnodes marked VV_SYSTEM. > */ > if ((flags & SKIPSYSTEM) && (vp->v_vflag & VV_SYSTEM)) { > VOP_UNLOCK(vp, 0, td); > mtx_lock(&mntvnode_mtx); > continue; > } > /* > * If WRITECLOSE is set, flush out unlinked but still open > * files (even if open only for reading) and regular file > * vnodes open for writing. > */ > error = VOP_GETATTR(vp, &vattr, td->td_ucred, td); > VI_LOCK(vp); > > As near as I can tell the panic is happening in VOP_GETATTR(). It looks > to me like it would be possible for the vnode to be recycled between the > time when it passes the vp->v_mount test at the top of the loop and the > time when vn_lock() succeeds. Shouldn't we bump the vnode reference > count by calling vref() at the top of the loop and add the appropriate > calls to vrele()? Rev.1.395 made some changes that I didn't like much here. The VOP_GETATTR() is now done unconditionally. This pessimizes vflush() and enlarges any race windows. I think WRITECLOSE is only used for mount -u from rw to ro, so the pessimization exercises code that was rarely used before. Rev.1.394 called VOP_GETATTR() with the interlock held. This was wrong but probably reduced race windows. The window seems to have been opened before rev.1.394 by releasing mntvnode_slock before aquiring the interlock. RELENG_4 doesn't release mntvnode_slock at that point (it holds both locks across the VOP_GETATTR()). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message