From owner-freebsd-fs@freebsd.org Wed May 18 11:08:41 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E12B9B40B8D for ; Wed, 18 May 2016 11:08:41 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id CA015189E for ; Wed, 18 May 2016 11:08:41 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id C9536B40B8C; Wed, 18 May 2016 11:08:41 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C6C93B40B8B for ; Wed, 18 May 2016 11:08:41 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6A90A189D for ; Wed, 18 May 2016 11:08:41 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4IB8ZdA002871 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 18 May 2016 14:08:35 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4IB8ZdA002871 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u4IB8YXl002870; Wed, 18 May 2016 14:08:34 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 18 May 2016 14:08:34 +0300 From: Konstantin Belousov To: Bruce Evans Cc: fs@freebsd.org Subject: Re: fix for per-mount i/o counting in ffs Message-ID: <20160518110834.GJ89104@kib.kiev.ua> References: <20160517072104.I2137@besplex.bde.org> <20160517084241.GY89104@kib.kiev.ua> <20160518061040.D5948@besplex.bde.org> <20160518070252.F6121@besplex.bde.org> <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160518084931.T6534@besplex.bde.org> User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 May 2016 11:08:42 -0000 On Wed, May 18, 2016 at 10:00:09AM +1000, Bruce Evans wrote: > On Wed, 18 May 2016, Konstantin Belousov wrote: > > VCHR check ensures that the devvp vnode is not reclaimed. I do not want > > to remove the check and rely on the caller of ffs_mountfs() to always do > > the right thing for it without unlocking devvp, this is too subtle. > > Surely the caller must lock devvp? Otherwise none of the uses of devvp > can be trusted, and there are several others. It must lock, but the interface of ffs_mountfs() would then require that there is no relock between vn_isdisk() check and call. I think I know how to make a good compromise there. I converted the check for VCHR into the assert. > There is also ump->um_devvvp, but this seems to be unusable since it > might go away. Go away as in being reclaimed, yes. The vnode itself is there, since we keep a reference. > > So using the devvp->v_rdev instead of the dev variable is not just a > style bug. Might be. > > devvp->v_bufobj.bo_ops = &ffs_ops; > > if (devvp->v_type == VCHR) > > devvp must be still VCHR since this is now under the vnode lock, and we > depend on dev remaining a character device for the disk described by > devvp at the time of the vn_isdisk() check. Unless relocked. > > > - devvp->v_rdev->si_mountpt = mp; > > + dev->si_mountpt = mp; > > + VOP_UNLOCK(devvp, 0); > > The unlocking could be a little earlier since dev is still for a disk even > if devvp went away and you changed this to not used devvp->v_rdev. > > > > > fs = NULL; > > sblockloc = 0; > > Unlocking and then using devvp sure looks like a race. > > You only needed to move the unlocking to fix. devvp->v_bufobj. How does > that work? The write is now locked, but if devvp goes away, then don't > we lose its bufobj? The buffer queues are flushed, and BO_DEAD flag is set. But the flag does very little. > How does any use of ump->um_devvp work? > > I tried revoke(2) on the devvp of a mounted file system. This worked > to give v_type = VBAD and v_rdev = NULL, but didn't crash. ffs_unmount() > checked for the bad vnode, unlike most places, and failed to clear > si_mountpt. > > Normal use doesn't have revokes, but if the vnode is reclaimed instead > of just becoming bad, then worse things probably happen. I think vnode > cache resizing gives very unstable storage so the pointer becomes very > invalid. But even revoke followed by setting kern.numvnodes to 1 didn't > crash (15 vnodes remained). So devvp must be referenced throughout. > It seems to have reference count 2, since umounting reduced kern.numvnodes > from 15 to 13. (It is surprising how much works with kern.maxvnodes=1. > I was able to run revoke, sysctl and umount.) It is still a mystery > that the VBAD vnode doesn't crash soon. > I believe that bo_ops assignment is the reason why UFS mounts survive the reclamation of the devvp vnode. Take a look at the ffs_geom_strategy(), which is the place where UFS io is tunneled directly into geom. It does not pass io requests through devfs. As result, revocation does not change much except doing unneccessary buf queue flush. It might be telling to try the same experiment, as conducted in your next message, on msdosfs instead of UFS. Below is the simplified patch. diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index 712fc21..412b000 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -764,6 +764,7 @@ ffs_mountfs(devvp, mp, td) cred = td ? td->td_ucred : NOCRED; ronly = (mp->mnt_flag & MNT_RDONLY) != 0; + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp")); dev = devvp->v_rdev; dev_ref(dev); DROP_GIANT(); @@ -771,17 +772,17 @@ ffs_mountfs(devvp, mp, td) error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); g_topology_unlock(); PICKUP_GIANT(); - VOP_UNLOCK(devvp, 0); - if (error) + if (error != 0) { + VOP_UNLOCK(devvp, 0); goto out; - if (devvp->v_rdev->si_iosize_max != 0) - mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max; + } + if (dev->si_iosize_max != 0) + mp->mnt_iosize_max = dev->si_iosize_max; if (mp->mnt_iosize_max > MAXPHYS) mp->mnt_iosize_max = MAXPHYS; - devvp->v_bufobj.bo_ops = &ffs_ops; - if (devvp->v_type == VCHR) - devvp->v_rdev->si_mountpt = mp; + dev->si_mountpt = mp; + VOP_UNLOCK(devvp, 0); fs = NULL; sblockloc = 0; @@ -1083,8 +1084,7 @@ ffs_mountfs(devvp, mp, td) out: if (bp) brelse(bp); - if (devvp->v_type == VCHR && devvp->v_rdev != NULL) - devvp->v_rdev->si_mountpt = NULL; + dev->si_mountpt = NULL; if (cp != NULL) { DROP_GIANT(); g_topology_lock(); @@ -1287,8 +1287,7 @@ ffs_unmount(mp, mntflags) g_vfs_close(ump->um_cp); g_topology_unlock(); PICKUP_GIANT(); - if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL) - ump->um_devvp->v_rdev->si_mountpt = NULL; + ump->um_dev->si_mountpt = NULL; vrele(ump->um_devvp); dev_rel(ump->um_dev); mtx_destroy(UFS_MTX(ump));