From owner-svn-src-all@freebsd.org Fri Aug 25 21:29:50 2017 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 80A1ADE1CFB; Fri, 25 Aug 2017 21:29:50 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2610:1c1:1:6074::16:84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3D673713C7; Fri, 25 Aug 2017 21:29:50 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (unknown [127.0.1.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by freefall.freebsd.org (Postfix) with ESMTPS id 59CC91700B; Fri, 25 Aug 2017 21:29:49 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (localhost [172.31.3.2]) by mail.xzibition.com (Postfix) with ESMTP id 41F738769; Fri, 25 Aug 2017 21:29:48 +0000 (UTC) X-Virus-Scanned: amavisd-new at mail.xzibition.com Received: from mail.xzibition.com ([172.31.3.2]) by mail.xzibition.com (mail.xzibition.com [172.31.3.2]) (amavisd-new, port 10026) with LMTP id 2kf16NeG8ST4; Fri, 25 Aug 2017 21:29:44 +0000 (UTC) Subject: Re: svn commit: r306512 - in head/sys: kern sys DKIM-Filter: OpenDKIM Filter v2.9.2 mail.xzibition.com 3F2B48763 To: Konstantin Belousov Cc: Mateusz Guzik , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org References: <201609301727.u8UHRIgD051373@repo.freebsd.org> <6ea06065-ea5c-b83b-3854-224626be0d77@FreeBSD.org> <20170825103534.GK1700@kib.kiev.ua> From: Bryan Drewery Openpgp: id=F9173CB2C3AAEA7A5C8A1F0935D771BB6E4697CF; url=http://www.shatow.net/bryan/bryan2.asc Organization: FreeBSD Message-ID: Date: Fri, 25 Aug 2017 14:29:28 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170825103534.GK1700@kib.kiev.ua> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="SMKtgHVDJUU7B7Ljl2KSn0sxmOTpFiBl3" X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Aug 2017 21:29:50 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --SMKtgHVDJUU7B7Ljl2KSn0sxmOTpFiBl3 Content-Type: multipart/mixed; boundary="E5JqXpauGdNtMJa8VsIOOAsUlndGXmoRe"; protected-headers="v1" From: Bryan Drewery To: Konstantin Belousov Cc: Mateusz Guzik , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Message-ID: Subject: Re: svn commit: r306512 - in head/sys: kern sys References: <201609301727.u8UHRIgD051373@repo.freebsd.org> <6ea06065-ea5c-b83b-3854-224626be0d77@FreeBSD.org> <20170825103534.GK1700@kib.kiev.ua> In-Reply-To: <20170825103534.GK1700@kib.kiev.ua> --E5JqXpauGdNtMJa8VsIOOAsUlndGXmoRe Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 8/25/2017 3:35 AM, Konstantin Belousov wrote: > On Thu, Aug 24, 2017 at 08:18:03PM -0700, Bryan Drewery wrote: >> On 9/30/2016 10:27 AM, Mateusz Guzik wrote: >>> Author: mjg >>> Date: Fri Sep 30 17:27:17 2016 >>> New Revision: 306512 >>> URL: https://svnweb.freebsd.org/changeset/base/306512 >>> >>> Log: >>> vfs: batch free vnodes in per-mnt lists >>> =20 >>> Previously free vnodes would always by directly returned to the glo= bal >>> LRU list. With this change up to mnt_free_list_batch vnodes are col= lected >>> first. >>> =20 >>> syncer runs always return the batch regardless of its size. >>> =20 >>> While vnodes on per-mnt lists are not counted as free, they can be >>> returned in case of vnode shortage. >>> =20 >>> Reviewed by: kib >>> Tested by: pho >>> >>> Modified: >>> head/sys/kern/vfs_mount.c >>> head/sys/kern/vfs_subr.c >>> head/sys/sys/mount.h >>> head/sys/sys/vnode.h >>> >> ... >>> @@ -2753,17 +2824,25 @@ _vhold(struct vnode *vp, bool locked) >>> * Remove a vnode from the free list, mark it as in use, >>> * and put it on the active list. >>> */ >>> - mtx_lock(&vnode_free_list_mtx); >>> - TAILQ_REMOVE(&vnode_free_list, vp, v_actfreelist); >>> - freevnodes--; >>> - vp->v_iflag &=3D ~VI_FREE; >>> + mp =3D vp->v_mount; >>> + mtx_lock(&mp->mnt_listmtx); >> ^^ >> >>> + if ((vp->v_mflag & VMP_TMPMNTFREELIST) !=3D 0) { >>> + TAILQ_REMOVE(&mp->mnt_tmpfreevnodelist, vp, v_actfreelist); >>> + mp->mnt_tmpfreevnodelistsize--; >>> + vp->v_mflag &=3D ~VMP_TMPMNTFREELIST; >>> + } else { >>> + mtx_lock(&vnode_free_list_mtx); >>> + TAILQ_REMOVE(&vnode_free_list, vp, v_actfreelist); >>> + freevnodes--; >>> + mtx_unlock(&vnode_free_list_mtx); >>> + } >>> KASSERT((vp->v_iflag & VI_ACTIVE) =3D=3D 0, >>> ("Activating already active vnode")); >>> + vp->v_iflag &=3D ~VI_FREE; >>> vp->v_iflag |=3D VI_ACTIVE; >>> - mp =3D vp->v_mount; >>> TAILQ_INSERT_HEAD(&mp->mnt_activevnodelist, vp, v_actfreelist); >>> mp->mnt_activevnodelistsize++; >>> - mtx_unlock(&vnode_free_list_mtx); >>> + mtx_unlock(&mp->mnt_listmtx); >>> refcount_acquire(&vp->v_holdcnt); >>> if (!locked) >>> VI_UNLOCK(vp); >>> @@ -2819,21 +2898,25 @@ _vdrop(struct vnode *vp, bool locked) >>> if ((vp->v_iflag & VI_OWEINACT) =3D=3D 0) { >>> vp->v_iflag &=3D ~VI_ACTIVE; >>> mp =3D vp->v_mount; >>> - mtx_lock(&vnode_free_list_mtx); >>> + mtx_lock(&mp->mnt_listmtx); >> ^^ >> >> If code runs getnewvnode() and then immediately runs vhold() or vrele(= ), >> without first running insmntque(vp, mp), then vp->v_mount is NULL here= >> and the lock/freelist dereferencing just panic. > getnewvnode() returns vref-ed vnode, i.e. both hold and use counts are > set to 1. What is the use case there which requires vhold-ing vnode > without finishing its construction ? None that I can think of. For vhold I was just observing that a NULL dereference is possible. >=20 >> Locking the vnode and then using vgone() and vput() on it avoids the >> issue since it marks the vnode VI_DOOMED and instead frees it in >> _vdrop() rather than try to re-add it to its NULL per-mount free list.= > This is the common pattern where insmntque() fails. >=20 >> >> I'm not sure what the right thing here is. Is it a requirement to >> insmntque() a new vnode immediately, or to vgone() it before releasing= >> it if was just returned from getnewvnode()? > These are the only uses of a newly allocated vnode which make sense. > Do you need this for something that does not happen in the svn tree ? Outside of the tree in OneFS, we have some code that does getnewvnode(), then adds the unlocked vnode to a kqueue event list (it's actually very complicated compared to the basic description), and then does insmntque. The event list registration is failing in a stress run and it only does a vrele in this case. I need to change it to lock/vgone/vrele (or swap the event registration and insmntque calls of which I'm unsure on the impact). Previously the vrele was enough, and not vgone too, since the vnode was just added back to the global free list. But now the v_mount is required since the free list is per-mount. It is trivial for me to fix the panic I am running into, but it feels like there's a bug in _vdrop now due to this commit and I want to be sure I follow through with it being fixed properly. My confusion/question is probably about what the real distinction here is between using vgone before vrele to cause the vnode to be free'd and not using vgone and causing it to go back into the free list instead - which I cannot do now without swapping the calls. It seems to me that an assert would now make sense in the _vdrop !VI_DOOMED case since a v_mount is required to add back into the free list, and probably _vhold, such as: VNASSERT(mp !=3D NULL, vn, ("%s: not on a mount vnode list")); Or we just add it back to the global mount list in _vdrop as it was supported before with something like: https://people.freebsd.org/~bdrewery/patches/vdrop-global-list.diff >=20 >> >> Perhaps these cases should assert that v_mount is not NULL or handle t= he >> odd case and just free the vnode instead, or can it still just add to >> the global vnode_free_list if mp is NULL? >> >> The old code handled the case fine since the freelist was global and n= ot >> per-mount. 'mp' was only dereferenced below if the vnode had been >> active, which was not the case for my example. >=20 --=20 Regards, Bryan Drewery --E5JqXpauGdNtMJa8VsIOOAsUlndGXmoRe-- --SMKtgHVDJUU7B7Ljl2KSn0sxmOTpFiBl3 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJZoJa5AAoJEDXXcbtuRpfPXXgIANkXmvByc7MGNIdvJFyLRxzT ZSiPuhoshBmaE5lk2ydctVikPPqp7ntFF3k7+8hwEGHw1+AySLIq+/aQSSeb3nbL zNO+QyNTlDeUeyFY4Ij08hwmEbSF+ltwihivoiNbeMDA5t3a8QaTlW8Rsl2J1tqf TGjEpiyuukLK6JZY/z2MVfui4viQtM5hfXYDGe6g54jgDSjCkZbI/96wp2AKvIxs Bi15Iyjqddmhze5mVbfmzLNBjwUaL6/PMntGNpwQ1J9OeillUPYzTb82QuJ43FjP g0ix4wY+1HzctlnUarTqnlU84cPrMKoi0doghrZwa8iutEsvxxmet/CYLeTIZXQ= =gpwn -----END PGP SIGNATURE----- --SMKtgHVDJUU7B7Ljl2KSn0sxmOTpFiBl3--