Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 Aug 2017 14:29:28 -0700
From:      Bryan Drewery <bdrewery@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Mateusz Guzik <mjg@FreeBSD.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r306512 - in head/sys: kern sys
Message-ID:  <c9233211-318b-802b-27e6-862808f9ea77@FreeBSD.org>
In-Reply-To: <20170825103534.GK1700@kib.kiev.ua>
References:  <201609301727.u8UHRIgD051373@repo.freebsd.org> <6ea06065-ea5c-b83b-3854-224626be0d77@FreeBSD.org> <20170825103534.GK1700@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--SMKtgHVDJUU7B7Ljl2KSn0sxmOTpFiBl3
Content-Type: multipart/mixed; boundary="E5JqXpauGdNtMJa8VsIOOAsUlndGXmoRe";
 protected-headers="v1"
From: Bryan Drewery <bdrewery@FreeBSD.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: Mateusz Guzik <mjg@FreeBSD.org>, src-committers@freebsd.org,
 svn-src-all@freebsd.org, svn-src-head@freebsd.org
Message-ID: <c9233211-318b-802b-27e6-862808f9ea77@FreeBSD.org>
Subject: Re: svn commit: r306512 - in head/sys: kern sys
References: <201609301727.u8UHRIgD051373@repo.freebsd.org>
 <6ea06065-ea5c-b83b-3854-224626be0d77@FreeBSD.org>
 <20170825103534.GK1700@kib.kiev.ua>
In-Reply-To: <20170825103534.GK1700@kib.kiev.ua>

--E5JqXpauGdNtMJa8VsIOOAsUlndGXmoRe
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable

On 8/25/2017 3:35 AM, Konstantin Belousov wrote:
> On Thu, Aug 24, 2017 at 08:18:03PM -0700, Bryan Drewery wrote:
>> On 9/30/2016 10:27 AM, Mateusz Guzik wrote:
>>> Author: mjg
>>> Date: Fri Sep 30 17:27:17 2016
>>> New Revision: 306512
>>> URL: https://svnweb.freebsd.org/changeset/base/306512
>>>
>>> Log:
>>>   vfs: batch free vnodes in per-mnt lists
>>>  =20
>>>   Previously free vnodes would always by directly returned to the glo=
bal
>>>   LRU list. With this change up to mnt_free_list_batch vnodes are col=
lected
>>>   first.
>>>  =20
>>>   syncer runs always return the batch regardless of its size.
>>>  =20
>>>   While vnodes on per-mnt lists are not counted as free, they can be
>>>   returned in case of vnode shortage.
>>>  =20
>>>   Reviewed by:	kib
>>>   Tested by:	pho
>>>
>>> Modified:
>>>   head/sys/kern/vfs_mount.c
>>>   head/sys/kern/vfs_subr.c
>>>   head/sys/sys/mount.h
>>>   head/sys/sys/vnode.h
>>>
>> ...
>>> @@ -2753,17 +2824,25 @@ _vhold(struct vnode *vp, bool locked)
>>>  	 * Remove a vnode from the free list, mark it as in use,
>>>  	 * and put it on the active list.
>>>  	 */
>>> -	mtx_lock(&vnode_free_list_mtx);
>>> -	TAILQ_REMOVE(&vnode_free_list, vp, v_actfreelist);
>>> -	freevnodes--;
>>> -	vp->v_iflag &=3D ~VI_FREE;
>>> +	mp =3D vp->v_mount;
>>> +	mtx_lock(&mp->mnt_listmtx);
>>                   ^^
>>
>>> +	if ((vp->v_mflag & VMP_TMPMNTFREELIST) !=3D 0) {
>>> +		TAILQ_REMOVE(&mp->mnt_tmpfreevnodelist, vp, v_actfreelist);
>>> +		mp->mnt_tmpfreevnodelistsize--;
>>> +		vp->v_mflag &=3D ~VMP_TMPMNTFREELIST;
>>> +	} else {
>>> +		mtx_lock(&vnode_free_list_mtx);
>>> +		TAILQ_REMOVE(&vnode_free_list, vp, v_actfreelist);
>>> +		freevnodes--;
>>> +		mtx_unlock(&vnode_free_list_mtx);
>>> +	}
>>>  	KASSERT((vp->v_iflag & VI_ACTIVE) =3D=3D 0,
>>>  	    ("Activating already active vnode"));
>>> +	vp->v_iflag &=3D ~VI_FREE;
>>>  	vp->v_iflag |=3D VI_ACTIVE;
>>> -	mp =3D vp->v_mount;
>>>  	TAILQ_INSERT_HEAD(&mp->mnt_activevnodelist, vp, v_actfreelist);
>>>  	mp->mnt_activevnodelistsize++;
>>> -	mtx_unlock(&vnode_free_list_mtx);
>>> +	mtx_unlock(&mp->mnt_listmtx);
>>>  	refcount_acquire(&vp->v_holdcnt);
>>>  	if (!locked)
>>>  		VI_UNLOCK(vp);
>>> @@ -2819,21 +2898,25 @@ _vdrop(struct vnode *vp, bool locked)
>>>  		if ((vp->v_iflag & VI_OWEINACT) =3D=3D 0) {
>>>  			vp->v_iflag &=3D ~VI_ACTIVE;
>>>  			mp =3D vp->v_mount;
>>> -			mtx_lock(&vnode_free_list_mtx);
>>> +			mtx_lock(&mp->mnt_listmtx);
>>                                   ^^
>>
>> If code runs getnewvnode() and then immediately runs vhold() or vrele(=
),
>> without first running insmntque(vp, mp), then vp->v_mount is NULL here=

>> and the lock/freelist dereferencing just panic.
> getnewvnode() returns vref-ed vnode, i.e. both hold and use counts are
> set to 1.  What is the use case there which requires vhold-ing vnode
> without finishing its construction ?

None that I can think of.  For vhold I was just observing that a NULL
dereference is possible.

>=20
>> Locking the vnode and then using vgone() and vput() on it avoids the
>> issue since it marks the vnode VI_DOOMED and instead frees it in
>> _vdrop() rather than try to re-add it to its NULL per-mount free list.=

> This is the common pattern where insmntque() fails.
>=20
>>
>> I'm not sure what the right thing here is.  Is it a requirement to
>> insmntque() a new vnode immediately, or to vgone() it before releasing=

>> it if was just returned from getnewvnode()?
> These are the only uses of a newly allocated vnode which make sense.
> Do you need this for something that does not happen in the svn tree ?

Outside of the tree in OneFS, we have some code that does getnewvnode(),
then adds the unlocked vnode to a kqueue event list (it's actually very
complicated compared to the basic description), and then does insmntque.
 The event list registration is failing in a stress run and it only does
a vrele in this case.  I need to change it to lock/vgone/vrele (or swap
the event registration and insmntque calls of which I'm unsure on the
impact).  Previously the vrele was enough, and not vgone too, since the
vnode was just added back to the global free list.  But now the v_mount
is required since the free list is per-mount.

It is trivial for me to fix the panic I am running into, but it feels
like there's a bug in _vdrop now due to this commit and I want to be
sure I follow through with it being fixed properly.  My
confusion/question is probably about what the real distinction here is
between using vgone before vrele to cause the vnode to be free'd and not
using vgone and causing it to go back into the free list instead - which
I cannot do now without swapping the calls.

It seems to me that an assert would now make sense in the _vdrop
!VI_DOOMED case since a v_mount is required to add back into the free
list, and probably _vhold, such as:
VNASSERT(mp !=3D NULL, vn, ("%s: not on a mount vnode list"));

Or we just add it back to the global mount list in _vdrop as it was
supported before with something like:

https://people.freebsd.org/~bdrewery/patches/vdrop-global-list.diff

>=20
>>
>> Perhaps these cases should assert that v_mount is not NULL or handle t=
he
>> odd case and just free the vnode instead, or can it still just add to
>> the global vnode_free_list if mp is NULL?
>>
>> The old code handled the case fine since the freelist was global and n=
ot
>> per-mount.  'mp' was only dereferenced below if the vnode had been
>> active, which was not the case for my example.
>=20


--=20
Regards,
Bryan Drewery


--E5JqXpauGdNtMJa8VsIOOAsUlndGXmoRe--

--SMKtgHVDJUU7B7Ljl2KSn0sxmOTpFiBl3
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBAgAGBQJZoJa5AAoJEDXXcbtuRpfPXXgIANkXmvByc7MGNIdvJFyLRxzT
ZSiPuhoshBmaE5lk2ydctVikPPqp7ntFF3k7+8hwEGHw1+AySLIq+/aQSSeb3nbL
zNO+QyNTlDeUeyFY4Ij08hwmEbSF+ltwihivoiNbeMDA5t3a8QaTlW8Rsl2J1tqf
TGjEpiyuukLK6JZY/z2MVfui4viQtM5hfXYDGe6g54jgDSjCkZbI/96wp2AKvIxs
Bi15Iyjqddmhze5mVbfmzLNBjwUaL6/PMntGNpwQ1J9OeillUPYzTb82QuJ43FjP
g0ix4wY+1HzctlnUarTqnlU84cPrMKoi0doghrZwa8iutEsvxxmet/CYLeTIZXQ=
=gpwn
-----END PGP SIGNATURE-----

--SMKtgHVDJUU7B7Ljl2KSn0sxmOTpFiBl3--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c9233211-318b-802b-27e6-862808f9ea77>