From owner-freebsd-current  Sun Sep  1  1:25:22 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 55DC837B400; Sun,  1 Sep 2002 01:25:19 -0700 (PDT)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 19D8643E42; Sun,  1 Sep 2002 01:25:18 -0700 (PDT)
	(envelope-from bde@zeta.org.au)
Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id IAA18602;
	Sun, 1 Sep 2002 08:25:02 GMT
Date: Sun, 1 Sep 2002 18:32:14 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Don Lewis <dl-freebsd@catspoiler.org>
Cc: kris@obsecurity.org, <current@FreeBSD.ORG>, <phk@FreeBSD.ORG>
Subject: Re: Page faults from bento cluster (Re: Problems reading vmcores)
In-Reply-To: <200209010726.g817QEwr067908@gw.catspoiler.org>
Message-ID: <20020901181130.D8550-100000@gamplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

On Sun, 1 Sep 2002, Don Lewis wrote:

> This code in vflush() bothers me:
>
>         mtx_lock(&mntvnode_mtx);
> loop:
>         for (vp = TAILQ_FIRST(&mp->mnt_nvnodelist); vp; vp = nvp) {
>                 /*
>                  * Make sure this vnode wasn't reclaimed in getnewvnode().
>                  * Start over if it has (it won't be on the list anymore).
>                  */
>                 if (vp->v_mount != mp)
>                         goto loop;
>                 nvp = TAILQ_NEXT(vp, v_nmntvnodes);
>
>                 mtx_unlock(&mntvnode_mtx);
>                 vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td);
>                 /*
>                  * Skip over a vnodes marked VV_SYSTEM.
>                  */
>                 if ((flags & SKIPSYSTEM) && (vp->v_vflag & VV_SYSTEM)) {
>                         VOP_UNLOCK(vp, 0, td);
>                         mtx_lock(&mntvnode_mtx);
>                         continue;
>                 }
>                 /*
>                  * If WRITECLOSE is set, flush out unlinked but still open
>                  * files (even if open only for reading) and regular file
>                  * vnodes open for writing.
>                  */
>                 error = VOP_GETATTR(vp, &vattr, td->td_ucred, td);
>                 VI_LOCK(vp);
>
> As near as I can tell the panic is happening in VOP_GETATTR().  It looks
> to me like it would be possible for the vnode to be recycled between the
> time when it passes the vp->v_mount test at the top of the loop and the
> time when vn_lock() succeeds.  Shouldn't we bump the vnode reference
> count by calling vref() at the top of the loop and add the appropriate
> calls to vrele()?

Rev.1.395 made some changes that I didn't like much here.  The
VOP_GETATTR() is now done unconditionally.  This pessimizes vflush()
and enlarges any race windows.  I think WRITECLOSE is only used for
mount -u from rw to ro, so the pessimization exercises code that was
rarely used before.

Rev.1.394 called VOP_GETATTR() with the interlock held.  This was wrong
but probably reduced race windows.  The window seems to have been
opened before rev.1.394 by releasing mntvnode_slock before aquiring
the interlock.  RELENG_4 doesn't release mntvnode_slock at that point
(it holds both locks across the VOP_GETATTR()).

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message