From owner-freebsd-hackers  Tue Oct 16 20:47:15 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31])
	by hub.freebsd.org (Postfix) with ESMTP
	id 828FA37B40C; Tue, 16 Oct 2001 20:47:07 -0700 (PDT)
Received: by flood.ping.uio.no (Postfix, from userid 2602)
	id 8CB1914C2E; Wed, 17 Oct 2001 05:47:05 +0200 (CEST)
X-URL: http://www.ofug.org/~des/
X-Disclaimer: The views expressed in this message do not necessarily
  coincide with those of any organisation or company with
  which I am or have been affiliated.
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: John Baldwin <jhb@FreeBSD.ORG>, fs@FreeBSD.ORG
Subject: Re: Some questions regarding vfs / ffs
References: <XFMail.011016191649.jhb@FreeBSD.org>
	<xzpk7xvvwbw.fsf@flood.ping.uio.no>
	<200110170229.f9H2Tph84237@apollo.backplane.com>
From: Dag-Erling Smorgrav <des@ofug.org>
Date: 17 Oct 2001 05:47:05 +0200
In-Reply-To: <200110170229.f9H2Tph84237@apollo.backplane.com>
Message-ID: <xzpg08jvsg6.fsf@flood.ping.uio.no>
Lines: 63
User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

[moving from -hackers to -fs where this *really* belongs]

Matthew Dillon <dillon@apollo.backplane.com> writes:
>     This would be an improvement, but not enough of an improvement.
>     We need to be able to discard most of the vnodes while simply
>     holding the mntvnode mutex, without even having to get the vnode
>     mutex or release the mntvnode mutex.  When we find a vnode that
>     'might' require some action, that's when we do the heavy-weight
>     mutex junk.  At least in the case where it's ok to occassionally
>     not get it right (which is most cases).

I think you're basically repeating with different words what I said in
the mail you replied to.  What I said can be summarized as "don't
acquire or release any locks until we know the vnode is dirty".

The loop checks three pieces of information to see if the vnode is
dirty:

 - vp->v_type: this can't change while the mntvnode lock is held,
   because it can only change if the vnode is reclaimed and reused,
   and you need the mntvnode lock to reclaim a vnode.

 - vp->v_data: I believe the same comment applies as above; vp->v_data
   is only manipulated in ffs_vget() and ufs_reclaim().  The first
   occurs before the vnode is inserted into the list, and the second
   occurs after it has been removed.

 - ip->i_flag (where ip == vp->v_data): may possibly require the vnode
   lock to be held.

 - vp->v_dirtyblkhd: may possibly require the vnode lock to be held.

Actually, I'm willing to take my chances with the latter two; unless
there's a possibility that the value read will be corrupted (rather
than just stale) in case of a race, it doesn't really matter.  There
are four possibilities:

 - vnode is dirty and gets synced: GOOD!

 - vnode is clean and doesn't get synced: GOOD!

 - vnode is dirty and doesn't get synced: means it was dirtied while
   it was being examined by ffs_sync(); the current code already
   allows that, and the only way to prevent it is to grab a
   filesystem-wide lock - I don't really see the point.

 - vnode is clean but gets synced anyway: means it was cleaned while
   it was being examined by ffs_sync(), which is already possible with
   the current code (one process calls sync() while another calls
   fsync()); see above.

Somebody on IRC suggested changing ffs_sync() to traverse the synclist
instead of the mountpoint's vnode list, and just comparing v_mount to
mp and ignoring vnodes that aren't "ours".  It would work, but gives
me goosebumps for some reason.

All of this only solves part of the problem, though - the ffs_sync()
part - there's still something screwy with sched_sync(), but I'll need
to acquite more profiling data to figure out just *what*.

DES
-- 
Dag-Erling Smorgrav - des@ofug.org

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message