From owner-freebsd-fs@freebsd.org Fri May 20 14:27:04 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 57278B426E4 for ; Fri, 20 May 2016 14:27:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 3E68F1F34 for ; Fri, 20 May 2016 14:27:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 3A065B426E2; Fri, 20 May 2016 14:27:04 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 39A6AB426E0 for ; Fri, 20 May 2016 14:27:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BE0F61F33 for ; Fri, 20 May 2016 14:27:03 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4KEQt3e006945 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 20 May 2016 17:26:55 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4KEQt3e006945 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u4KEQsxK006944; Fri, 20 May 2016 17:26:54 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 20 May 2016 17:26:54 +0300 From: Konstantin Belousov To: Bruce Evans Cc: fs@freebsd.org Subject: Re: fix for per-mount i/o counting in ffs Message-ID: <20160520142654.GW89104@kib.kiev.ua> References: <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org> <20160518110834.GJ89104@kib.kiev.ua> <20160519065714.H1393@besplex.bde.org> <20160519094901.O1798@besplex.bde.org> <20160519120557.A2250@besplex.bde.org> <20160519104128.GN89104@kib.kiev.ua> <20160520074427.W1151@besplex.bde.org> <20160520092348.GV89104@kib.kiev.ua> <20160520194427.W1170@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160520194427.W1170@besplex.bde.org> User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 May 2016 14:27:04 -0000 On Fri, May 20, 2016 at 09:22:08PM +1000, Bruce Evans wrote: > On Fri, 20 May 2016, Konstantin Belousov wrote: > > > On Fri, May 20, 2016 at 09:27:38AM +1000, Bruce Evans wrote: > >>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c > >>> index 712fc21..21425f5 100644 > >>> --- a/sys/ufs/ffs/ffs_vfsops.c > >>> +++ b/sys/ufs/ffs/ffs_vfsops.c > >>> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td) > >>> cred = td ? td->td_ucred : NOCRED; > >>> ronly = (mp->mnt_flag & MNT_RDONLY) != 0; > >>> > >>> + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp")); > >>> dev = devvp->v_rdev; > >>> dev_ref(dev); > >>> + if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) { > >>> + dev_rel(dev); > >>> + VOP_UNLOCK(devvp, 0); > >>> + return (EBUSY); > >>> + } > >> > >> This is cleaner and safer than my version. > >> > >>> DROP_GIANT(); > >>> g_topology_lock(); > >>> error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); > >> > >> g_vfs_open() already sets devvp->v_bufobj.bo_ops to g_vfs_bufops unless > >> it fails. This clobbered our setting in the buggy multiple-mount case. > >> But with multiple mounts not allowed, this cleans up any garbage in > >> v_bufobj. > > Yes, and this orders things. g_vfs_open() shoudl have devvp locked, > > both fo bo manipulations and for vnode_create_vobject() call. > > We can only assign to bo_ops after g_vfs_open() was done successfully. > > The atomic cmpset now orders things too. Is that enough? It ensures > that an old mount cannot be active. I don't know if v_bufobj is used > for non-mounts. v_bufobj is logically protected against modifications by the vnode lock. > > Except, for zfs there is no g_vfs_open() to order things, and for all > other file systems there is no atomic cmpset yet. > > >> g_vfs_open() has 2 failures for non-exclusive access. It starts by > >> checking v_bufobj.bo_private == devvp (this is after translating its > >> pointers to the ones passed here). This is avg's fix for the multiple- > >> mounts problem (r206130). It doesn't work in all cases. I think this > >> is unecessary now. > > At least it weeds out other devfs mounts. > > Yes, we need it until everything is converted. > > >> ... > >> I now see another cleanup: don't goto out when g_vfs_open() fails. This > >> depends on it setting cp to NULL and leaving nothing to clean when it > >> fails. It has no man page and this detail is documented in its source > >> code. > > Then I would need to add another NULL assignment, VOP_UNLOCK etc. > > g_vfs_open() already sets cp to NULL when it fails, and the cleanup > depends on that now, but it is just as good to depend on no cleanup > being needed on failure. You do need another dev_rel(). > > I thought about moving the dev_ref() later to simplify the early returns. > I thought that this didn't quite work. Now I think it does work, for > obvious reasons: > - the device is attached to a vnode, so it is referenced to prevent it > going away unless the device is revoked. It seems to be referenced > at least 3 times in FreeBSD-9. > - the vnode is locked, so the reference count remains > 0 until we unlock. > So we just need a dev_ref() before the unlock in the non-error case, to > keep the device from going away if it is revoked. Yes, and this is how the current patched code is structured. > > > Updated patch to add acq/rel. > > > > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c > > index 712fc21..670bb15 100644 > > --- a/sys/ufs/ffs/ffs_vfsops.c > > +++ b/sys/ufs/ffs/ffs_vfsops.c > > @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td) > > cred = td ? td->td_ucred : NOCRED; > > ronly = (mp->mnt_flag & MNT_RDONLY) != 0; > > > > + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp")); > > Hrmph. I want this, it would remove amount of obvious questions. > > > dev = devvp->v_rdev; > > dev_ref(dev); > > Move later... > > > + if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, mp) != 0) { > > I changed the first 0 to NULL, and this works on i386, but now I remember > that i386 has bogus casts which break detection of type mismatches -- > the atomic ptr functions take a [u]intptr_t, not a pointer type, so > NULL won't work if it is ((void *)0). At least amd64 is still missing > this bug. cmpset__ptr() on i386 has cast for old and new parameters to u_int. store_rel_ptr() on i386 does not cast value to u_int. As result, NULL is acceptable for cmpset, but not for store. I spelled it 0 in all cases. Hm, I also should add uintptr_t cast for cmpset, otherwise, I suspect, some arch might be broken. > > > + dev_rel(dev); > > ...then this dev_rel() is not needed. > > > + VOP_UNLOCK(devvp, 0); > > + return (EBUSY); > > + } > > DROP_GIANT(); > > g_topology_lock(); > > error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); > > g_topology_unlock(); > > PICKUP_GIANT(); > > - VOP_UNLOCK(devvp, 0); > > - if (error) > > + if (error != 0) { > > + VOP_UNLOCK(devvp, 0); > > goto out; > > This becomes: > > if (error != 0) { > VOP_UNLOCK(devvp, 0); > return (EBUSY); > } > > Then assign v_bufobj. > > Then dev_ref(), just in time for unlocking. > > Then unlock. Ok. > > > - if (devvp->v_rdev->si_iosize_max != 0) > > - mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max; > > + } > > + if (dev->si_iosize_max != 0) > > + mp->mnt_iosize_max = dev->si_iosize_max; > > if (mp->mnt_iosize_max > MAXPHYS) > > mp->mnt_iosize_max = MAXPHYS; > > - > > devvp->v_bufobj.bo_ops = &ffs_ops; > > - if (devvp->v_type == VCHR) > > - devvp->v_rdev->si_mountpt = mp; > > + VOP_UNLOCK(devvp, 0); > > This belongs earlier. > > > > > fs = NULL; > > sblockloc = 0; > > ... > > We need this in a central function. g_vfs_open/close() can do it for > all cases except zfs. This looks like: I might look at this later. > > DROP_GIANT(); > g_topology_lock(); > // atomic_cmpset and its error = EBUSY moved to top of g_vfs_open() > error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); > g_topology_unlock(); > PICKUP_GIANT(); > if (error != 0) { > VOP_UNLOCK(devvp, 0); > return (error); > } > devvp->v_bufobj.bo_ops = &ffs_ops; > dev_ref(dev); > VOP_UNLOCK(devvp, 0); > if (dev->si_iosize_max != 0) > mp->mnt_iosize_max = dev->si_iosize_max; > if (mp->mnt_iosize_max > MAXPHYS) > mp->mnt_iosize_max = MAXPHYS; > > where 2 of 2 lines with GIANT and 3 of 4 lines with iosize_max remain to > be cleaned up. > > Resetting si_mountpt in g_vfs_close() is even simpler. Oops, it also has > to be reset in g_vfs_open() on a later failure there. diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index 712fc21..65b1891 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -764,25 +764,30 @@ ffs_mountfs(devvp, mp, td) cred = td ? td->td_ucred : NOCRED; ronly = (mp->mnt_flag & MNT_RDONLY) != 0; + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp")); dev = devvp->v_rdev; - dev_ref(dev); + if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, (uintptr_t)mp) != 0) { + VOP_UNLOCK(devvp, 0); + return (EBUSY); + } DROP_GIANT(); g_topology_lock(); error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); g_topology_unlock(); PICKUP_GIANT(); + if (error != 0) { + VOP_UNLOCK(devvp, 0); + atomic_store_rel_ptr(&dev->si_mountpt, 0); + return (error); + } + dev_ref(dev); + devvp->v_bufobj.bo_ops = &ffs_ops; VOP_UNLOCK(devvp, 0); - if (error) - goto out; - if (devvp->v_rdev->si_iosize_max != 0) - mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max; + if (dev->si_iosize_max != 0) + mp->mnt_iosize_max = dev->si_iosize_max; if (mp->mnt_iosize_max > MAXPHYS) mp->mnt_iosize_max = MAXPHYS; - devvp->v_bufobj.bo_ops = &ffs_ops; - if (devvp->v_type == VCHR) - devvp->v_rdev->si_mountpt = mp; - fs = NULL; sblockloc = 0; /* @@ -1083,8 +1088,6 @@ ffs_mountfs(devvp, mp, td) out: if (bp) brelse(bp); - if (devvp->v_type == VCHR && devvp->v_rdev != NULL) - devvp->v_rdev->si_mountpt = NULL; if (cp != NULL) { DROP_GIANT(); g_topology_lock(); @@ -1102,6 +1105,7 @@ out: free(ump, M_UFSMNT); mp->mnt_data = NULL; } + atomic_store_rel_ptr(&dev->si_mountpt, 0); dev_rel(dev); return (error); } @@ -1287,8 +1291,7 @@ ffs_unmount(mp, mntflags) g_vfs_close(ump->um_cp); g_topology_unlock(); PICKUP_GIANT(); - if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL) - ump->um_devvp->v_rdev->si_mountpt = NULL; + atomic_store_rel_ptr(&ump->um_dev->si_mountpt, 0); vrele(ump->um_devvp); dev_rel(ump->um_dev); mtx_destroy(UFS_MTX(ump));