Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 May 2012 20:55:47 +0100
From:      Attilio Rao <attilio@freebsd.org>
To:        Pawel Jakub Dawidek <pjd@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Kirk McKusick <mckusick@freebsd.org>
Subject:   Re: svn commit: r234482 - in head/sys: fs/msdosfs fs/nfsserver kern sys
Message-ID:  <CAJ-FndBFhNMboM-iOO=GN0OLXFGBTf4QU-enqcGoGobrBi-soA@mail.gmail.com>
In-Reply-To: <20120422170842.GA18403@garage.freebsd.pl>
References:  <201204200650.q3K6oiqO026673@svn.freebsd.org> <20120422170842.GA18403@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
2012/4/22 Pawel Jakub Dawidek <pjd@freebsd.org>:
> On Fri, Apr 20, 2012 at 06:50:44AM +0000, Kirk McKusick wrote:
>> Author: mckusick
>> Date: Fri Apr 20 06:50:44 2012
>> New Revision: 234482
>> URL: http://svn.freebsd.org/changeset/base/234482
>>
>> Log:
>> =C2=A0 This change creates a new list of active vnodes associated with
>> =C2=A0 a mount point. Active vnodes are those with a non-zero use or hol=
d
>> =C2=A0 count, e.g., those vnodes that are not on the free list. Note tha=
t
>> =C2=A0 this list is in addition to the list of all the vnodes associated
>> =C2=A0 with a mount point.
>>
>> =C2=A0 To avoid adding another set of linkage pointers to the vnode
>> =C2=A0 structure, the active list uses the existing linkage pointers
>> =C2=A0 used by the free list (previously named v_freelist, now renamed
>> =C2=A0 v_actfreelist).
>>
>> =C2=A0 This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loop=
s
>> =C2=A0 over just the active vnodes associated with a mount point (typica=
lly
>> =C2=A0 less than 1% of the vnodes associated with the mount point).
> [...]
>> @@ -1099,6 +1128,14 @@ insmntque1(struct vnode *vp, struct moun
>> =C2=A0 =C2=A0 =C2=A0 VNASSERT(mp->mnt_nvnodelistsize >=3D 0, vp,
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ("neg mount point vnode=
 list size"));
>> =C2=A0 =C2=A0 =C2=A0 mp->mnt_nvnodelistsize++;
>> + =C2=A0 =C2=A0 KASSERT((vp->v_iflag & VI_ACTIVE) =3D=3D 0,
>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 ("Activating already active vnode"));
>> + =C2=A0 =C2=A0 vp->v_iflag |=3D VI_ACTIVE;
>> + =C2=A0 =C2=A0 mtx_lock(&vnode_free_list_mtx);
>> + =C2=A0 =C2=A0 TAILQ_INSERT_HEAD(&mp->mnt_activevnodelist, vp, v_actfre=
elist);
>> + =C2=A0 =C2=A0 mp->mnt_activevnodelistsize++;
>> + =C2=A0 =C2=A0 mtx_unlock(&vnode_free_list_mtx);
>> + =C2=A0 =C2=A0 VI_UNLOCK(vp);
>> =C2=A0 =C2=A0 =C2=A0 MNT_IUNLOCK(mp);
>> =C2=A0 =C2=A0 =C2=A0 return (0);
>> =C2=A0}
>
>
> Now, for every vnode that is activated, it has to go through global
> mutex, which seems like scalability issue to me. With ZFS it is typical
> to have a lot of file systems and this global mutex was not needed
> before (well, it was needed, but only to get vnode from the free list).
>
> If we require vnode interlock to be held during v_actfreelist
> manipulation then why can't we use interlock+vnode_free_list_mtx when
> operating on the free list and interlock+per-mountpoint-lock when
> operating on mnt_activevnodelist?

I think this is the better idea for this case and it should really be fixed=
.
However, note that a per-mount lock here is far from being ideal as
you would contest a lot on the mountpoint if you have a lot of
activated vnodes.
Anyway the approach you propose doesn't introduce any scalability
difference than what we already had in place for pre-234482.

Also, it would be good to implement things like:
       mtx_assert(MNT_MTX(mp), MA_OWNED);

by providing appropriate wrappers as we do in vnode interface.

Thanks,
Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndBFhNMboM-iOO=GN0OLXFGBTf4QU-enqcGoGobrBi-soA>