From owner-freebsd-fs@FreeBSD.ORG Wed Nov 16 10:25:31 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3BE4106566C for ; Wed, 16 Nov 2011 10:25:31 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id DE8B18FC0A for ; Wed, 16 Nov 2011 10:25:30 +0000 (UTC) Received: from alf.home (alf.kiev.zoral.com.ua [10.1.1.177]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id pAG9rmfc056436 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 16 Nov 2011 11:53:48 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from alf.home (kostik@localhost [127.0.0.1]) by alf.home (8.14.5/8.14.5) with ESMTP id pAG9rmdb007215 for ; Wed, 16 Nov 2011 11:53:48 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by alf.home (8.14.5/8.14.5/Submit) id pAG9rmjK007214 for fs@freebsd.org; Wed, 16 Nov 2011 11:53:48 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: alf.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 16 Nov 2011 11:53:48 +0200 From: Kostik Belousov To: fs@freebsd.org Message-ID: <20111116095348.GD50300@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="YcxAF3djuctrf0Jt" Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Subject: VOP_VPTOCNP() interface change X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Nov 2011 10:25:31 -0000 --YcxAF3djuctrf0Jt Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Apparently, the VOP_VPTOCNP() interface has a fatal flaw that only matter for nullfs vnodes. The problem is that resulting vnode is only required to be held on return from the successfull call to vop, instead of being referenced. I designed the interface this way because dropping the reference might need to take the vnode lock, which is highly inconvenient in the main loop of vn_fullpath(), which holds the namecache rwlock for read. Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination with the VOP_VPTOCNP() interface means that the directory vnode returned from VOP_VPTOCNP() is reclaimed in advance, causing vn_fullpath() to error with EBADF or like. It appeared that my worries about dropping the cache lock were unfounded. Below is the patch that changes the requirements for VOP_VPTOCNP(), now the dvp must be referenced. I converted all in-tree implementations of VOP_VPTOCNP(), it appeared to be trivial, which is expected, because vhold(9) and vref(9) are similar in the locking. So I do not expect much troubles for out-of-tree fs implementation of VOP_VPTOCNP, if any. Below is the patch, it was tested by Peter Holm. Among other issues, I believe that it should fix some JVM errors when JVM is run from the nullfs-mounted fs. Please comment, I will commit the patch in a week. diff --git a/share/man/man9/VOP_VPTOCNP.9 b/share/man/man9/VOP_VPTOCNP.9 index 6bcbd25..671c8df 100644 --- a/share/man/man9/VOP_VPTOCNP.9 +++ b/share/man/man9/VOP_VPTOCNP.9 @@ -65,9 +65,9 @@ is not a directory, then .Nm returns ENOENT. .Sh LOCKS -The vnode should be locked on entry and will still be locked on exit. The -parent directory vnode will be unlocked on a successful exit. However, it -will have its hold count incremented. +The vnode should be locked on entry and will still be locked on exit. +The parent directory vnode will be unlocked on a successful exit. +However, it will have its use count incremented. .Sh RETURN VALUES Zero is returned on success, otherwise an error code is returned. .Sh ERRORS diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c b/= sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c index a880961..65fc902 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c @@ -1594,7 +1594,7 @@ zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap) *ap->a_buflen -=3D len; bcopy(sep->se_name, ap->a_buf + *ap->a_buflen, len); mutex_exit(&sdp->sd_lock); - vhold(dvp); + vref(dvp); *ap->a_vpp =3D dvp; } VN_RELE(dvp); diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c index eb154b1..22908b9 100644 --- a/sys/fs/devfs/devfs_vnops.c +++ b/sys/fs/devfs/devfs_vnops.c @@ -261,7 +261,7 @@ devfs_vptocnp(struct vop_vptocnp_args *ap) } else if (vp->v_type =3D=3D VDIR) { if (dd =3D=3D dmp->dm_rootdir) { *dvp =3D vp; - vhold(*dvp); + vref(*dvp); goto finished; } i -=3D dd->de_dirent->d_namlen; @@ -289,6 +289,8 @@ devfs_vptocnp(struct vop_vptocnp_args *ap) mtx_unlock(&devfs_de_interlock); vholdl(*dvp); VI_UNLOCK(*dvp); + vref(*dvp); + vdrop(*dvp); } else { mtx_unlock(&devfs_de_interlock); error =3D ENOENT; diff --git a/sys/fs/nullfs/null_subr.c b/sys/fs/nullfs/null_subr.c index 5717a01..a45ccf7 100644 --- a/sys/fs/nullfs/null_subr.c +++ b/sys/fs/nullfs/null_subr.c @@ -289,22 +289,12 @@ null_checkvp(vp, fil, lno) #endif if (a->null_lowervp =3D=3D NULLVP) { /* Should never happen */ - int i; u_long *p; - printf("vp =3D %p, ZERO ptr\n", (void *)vp); - for (p =3D (u_long *) a, i =3D 0; i < 8; i++) - printf(" %lx", p[i]); - printf("\n"); - panic("null_checkvp"); + panic("null_checkvp %p", vp); } VI_LOCK_FLAGS(a->null_lowervp, MTX_DUPOK); - if (a->null_lowervp->v_usecount < 1) { - int i; u_long *p; - printf("vp =3D %p, unref'ed lowervp\n", (void *)vp); - for (p =3D (u_long *) a, i =3D 0; i < 8; i++) - printf(" %lx", p[i]); - printf("\n"); - panic ("null with unref'ed lowervp"); - } + if (a->null_lowervp->v_usecount < 1) + panic ("null with unref'ed lowervp, vp %p lvp %p", + vp, a->null_lowervp); VI_UNLOCK(a->null_lowervp); #ifdef notyet printf("null %x/%d -> %x/%d [%s, %d]\n", diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c index 30a38da..bcf8750 100644 --- a/sys/fs/nullfs/null_vnops.c +++ b/sys/fs/nullfs/null_vnops.c @@ -729,7 +729,7 @@ null_print(struct vop_print_args *ap) { struct vnode *vp =3D ap->a_vp; =20 - printf("\tvp=3D%p, lowervp=3D%p\n", vp, NULLVPTOLOWERVP(vp)); + printf("\tvp=3D%p, lowervp=3D%p\n", vp, VTONULL(vp)->null_lowervp); return (0); } =20 @@ -784,6 +784,7 @@ null_vptocnp(struct vop_vptocnp_args *ap) vhold(lvp); VOP_UNLOCK(vp, 0); /* vp is held by vn_vptocnp_locked that called us */ ldvp =3D lvp; + vref(lvp); error =3D vn_vptocnp(&ldvp, cred, ap->a_buf, ap->a_buflen); vdrop(lvp); if (error !=3D 0) { @@ -797,19 +798,17 @@ null_vptocnp(struct vop_vptocnp_args *ap) */ error =3D vn_lock(ldvp, LK_EXCLUSIVE); if (error !=3D 0) { + vrele(ldvp); vn_lock(vp, locked | LK_RETRY); - vdrop(ldvp); return (ENOENT); } vref(ldvp); - vdrop(ldvp); error =3D null_nodeget(vp->v_mount, ldvp, dvp); if (error =3D=3D 0) { #ifdef DIAGNOSTIC NULLVPTOLOWERVP(*dvp); #endif - vhold(*dvp); - vput(*dvp); + VOP_UNLOCK(*dvp, 0); /* keep reference on *dvp */ } else vput(ldvp); =20 diff --git a/sys/fs/pseudofs/pseudofs_vnops.c b/sys/fs/pseudofs/pseudofs_vn= ops.c index 27b2605..a89174c 100644 --- a/sys/fs/pseudofs/pseudofs_vnops.c +++ b/sys/fs/pseudofs/pseudofs_vnops.c @@ -410,8 +410,7 @@ pfs_vptocnp(struct vop_vptocnp_args *ap) } =20 *buflen =3D i; - vhold(*dvp); - vput(*dvp); + VOP_UNLOCK(*dvp, 0); vn_lock(vp, locked | LK_RETRY); vfs_unbusy(mp); =20 diff --git a/sys/kern/vfs_cache.c b/sys/kern/vfs_cache.c index 11efcab..177fd56 100644 --- a/sys/kern/vfs_cache.c +++ b/sys/kern/vfs_cache.c @@ -1068,16 +1068,8 @@ vn_vptocnp(struct vnode **vp, struct ucred *cred, ch= ar *buf, u_int *buflen) =20 CACHE_RLOCK(); error =3D vn_vptocnp_locked(vp, cred, buf, buflen); - if (error =3D=3D 0) { - /* - * vn_vptocnp_locked() dropped hold acquired by - * VOP_VPTOCNP immediately after locking the - * cache. Since we are going to drop the cache rlock, - * re-hold the result. - */ - vhold(*vp); + if (error =3D=3D 0) CACHE_RUNLOCK(); - } return (error); } =20 @@ -1096,6 +1088,9 @@ vn_vptocnp_locked(struct vnode **vp, struct ucred *cr= ed, char *buf, if (ncp !=3D NULL) { if (*buflen < ncp->nc_nlen) { CACHE_RUNLOCK(); + vfslocked =3D VFS_LOCK_GIANT((*vp)->v_mount); + vrele(*vp); + VFS_UNLOCK_GIANT(vfslocked); numfullpathfail4++; error =3D ENOMEM; SDT_PROBE(vfs, namecache, fullpath, return, error, @@ -1106,18 +1101,23 @@ vn_vptocnp_locked(struct vnode **vp, struct ucred *= cred, char *buf, memcpy(buf + *buflen, ncp->nc_name, ncp->nc_nlen); SDT_PROBE(vfs, namecache, fullpath, hit, ncp->nc_dvp, ncp->nc_name, vp, 0, 0); + dvp =3D *vp; *vp =3D ncp->nc_dvp; + vref(*vp); + CACHE_RUNLOCK(); + vfslocked =3D VFS_LOCK_GIANT(dvp->v_mount); + vrele(dvp); + VFS_UNLOCK_GIANT(vfslocked); + CACHE_RLOCK(); return (0); } SDT_PROBE(vfs, namecache, fullpath, miss, vp, 0, 0, 0, 0); =20 - vhold(*vp); CACHE_RUNLOCK(); vfslocked =3D VFS_LOCK_GIANT((*vp)->v_mount); vn_lock(*vp, LK_SHARED | LK_RETRY); error =3D VOP_VPTOCNP(*vp, &dvp, cred, buf, buflen); - VOP_UNLOCK(*vp, 0); - vdrop(*vp); + vput(*vp); VFS_UNLOCK_GIANT(vfslocked); if (error) { numfullpathfail2++; @@ -1128,16 +1128,20 @@ vn_vptocnp_locked(struct vnode **vp, struct ucred *= cred, char *buf, =20 *vp =3D dvp; CACHE_RLOCK(); - if ((*vp)->v_iflag & VI_DOOMED) { + if (dvp->v_iflag & VI_DOOMED) { /* forced unmount */ CACHE_RUNLOCK(); - vdrop(*vp); + vfslocked =3D VFS_LOCK_GIANT(dvp->v_mount); + vrele(dvp); + VFS_UNLOCK_GIANT(vfslocked); error =3D ENOENT; SDT_PROBE(vfs, namecache, fullpath, return, error, vp, NULL, 0, 0); return (error); } - vdrop(*vp); + /* + * *vp has its use count incremented still. + */ =20 return (0); } @@ -1149,10 +1153,11 @@ static int vn_fullpath1(struct thread *td, struct vnode *vp, struct vnode *rdir, char *buf, char **retbuf, u_int buflen) { - int error, slash_prefixed; + int error, slash_prefixed, vfslocked; #ifdef KDTRACE_HOOKS struct vnode *startvp =3D vp; #endif + struct vnode *vp1; =20 buflen--; buf[buflen] =3D '\0'; @@ -1161,6 +1166,7 @@ vn_fullpath1(struct thread *td, struct vnode *vp, str= uct vnode *rdir, =20 SDT_PROBE(vfs, namecache, fullpath, entry, vp, 0, 0, 0, 0); numfullpathcalls++; + vref(vp); CACHE_RLOCK(); if (vp->v_type !=3D VDIR) { error =3D vn_vptocnp_locked(&vp, td->td_ucred, buf, &buflen); @@ -1168,6 +1174,9 @@ vn_fullpath1(struct thread *td, struct vnode *vp, str= uct vnode *rdir, return (error); if (buflen =3D=3D 0) { CACHE_RUNLOCK(); + vfslocked =3D VFS_LOCK_GIANT(vp->v_mount); + vrele(vp); + VFS_UNLOCK_GIANT(vfslocked); return (ENOMEM); } buf[--buflen] =3D '/'; @@ -1177,16 +1186,29 @@ vn_fullpath1(struct thread *td, struct vnode *vp, s= truct vnode *rdir, if (vp->v_vflag & VV_ROOT) { if (vp->v_iflag & VI_DOOMED) { /* forced unmount */ CACHE_RUNLOCK(); + vfslocked =3D VFS_LOCK_GIANT(vp->v_mount); + vrele(vp); + VFS_UNLOCK_GIANT(vfslocked); error =3D ENOENT; SDT_PROBE(vfs, namecache, fullpath, return, error, vp, NULL, 0, 0); break; } - vp =3D vp->v_mount->mnt_vnodecovered; + vp1 =3D vp->v_mount->mnt_vnodecovered; + vref(vp1); + CACHE_RUNLOCK(); + vfslocked =3D VFS_LOCK_GIANT(vp->v_mount); + vrele(vp); + VFS_UNLOCK_GIANT(vfslocked); + vp =3D vp1; + CACHE_RLOCK(); continue; } if (vp->v_type !=3D VDIR) { CACHE_RUNLOCK(); + vfslocked =3D VFS_LOCK_GIANT(vp->v_mount); + vrele(vp); + VFS_UNLOCK_GIANT(vfslocked); numfullpathfail1++; error =3D ENOTDIR; SDT_PROBE(vfs, namecache, fullpath, return, @@ -1198,6 +1220,9 @@ vn_fullpath1(struct thread *td, struct vnode *vp, str= uct vnode *rdir, break; if (buflen =3D=3D 0) { CACHE_RUNLOCK(); + vfslocked =3D VFS_LOCK_GIANT(vp->v_mount); + vrele(vp); + VFS_UNLOCK_GIANT(vfslocked); error =3D ENOMEM; SDT_PROBE(vfs, namecache, fullpath, return, error, startvp, NULL, 0, 0); @@ -1211,6 +1236,9 @@ vn_fullpath1(struct thread *td, struct vnode *vp, str= uct vnode *rdir, if (!slash_prefixed) { if (buflen =3D=3D 0) { CACHE_RUNLOCK(); + vfslocked =3D VFS_LOCK_GIANT(vp->v_mount); + vrele(vp); + VFS_UNLOCK_GIANT(vfslocked); numfullpathfail4++; SDT_PROBE(vfs, namecache, fullpath, return, ENOMEM, startvp, NULL, 0, 0); @@ -1220,6 +1248,9 @@ vn_fullpath1(struct thread *td, struct vnode *vp, str= uct vnode *rdir, } numfullpathfound++; CACHE_RUNLOCK(); + vfslocked =3D VFS_LOCK_GIANT(vp->v_mount); + vrele(vp); + VFS_UNLOCK_GIANT(vfslocked); =20 SDT_PROBE(vfs, namecache, fullpath, return, 0, startvp, buf + buflen, 0, 0); diff --git a/sys/kern/vfs_default.c b/sys/kern/vfs_default.c index e9f8151..e47498e 100644 --- a/sys/kern/vfs_default.c +++ b/sys/kern/vfs_default.c @@ -844,7 +844,7 @@ out: free(dirbuf, M_TEMP); if (!error) { *buflen =3D i; - vhold(*dvp); + vref(*dvp); } if (covered) { vput(*dvp); --YcxAF3djuctrf0Jt Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk7DiCsACgkQC3+MBN1Mb4hPPACgkKcAtEpvrR3P+/sAyofywP8l LFgAoNcgCy7+aROMTO76RWEBn6pbmADe =r5x7 -----END PGP SIGNATURE----- --YcxAF3djuctrf0Jt--