Date: Fri, 20 May 2016 17:26:54 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: fs@freebsd.org Subject: Re: fix for per-mount i/o counting in ffs Message-ID: <20160520142654.GW89104@kib.kiev.ua> In-Reply-To: <20160520194427.W1170@besplex.bde.org> References: <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org> <20160518110834.GJ89104@kib.kiev.ua> <20160519065714.H1393@besplex.bde.org> <20160519094901.O1798@besplex.bde.org> <20160519120557.A2250@besplex.bde.org> <20160519104128.GN89104@kib.kiev.ua> <20160520074427.W1151@besplex.bde.org> <20160520092348.GV89104@kib.kiev.ua> <20160520194427.W1170@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, May 20, 2016 at 09:22:08PM +1000, Bruce Evans wrote: > On Fri, 20 May 2016, Konstantin Belousov wrote: > > > On Fri, May 20, 2016 at 09:27:38AM +1000, Bruce Evans wrote: > >>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c > >>> index 712fc21..21425f5 100644 > >>> --- a/sys/ufs/ffs/ffs_vfsops.c > >>> +++ b/sys/ufs/ffs/ffs_vfsops.c > >>> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td) > >>> cred = td ? td->td_ucred : NOCRED; > >>> ronly = (mp->mnt_flag & MNT_RDONLY) != 0; > >>> > >>> + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp")); > >>> dev = devvp->v_rdev; > >>> dev_ref(dev); > >>> + if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) { > >>> + dev_rel(dev); > >>> + VOP_UNLOCK(devvp, 0); > >>> + return (EBUSY); > >>> + } > >> > >> This is cleaner and safer than my version. > >> > >>> DROP_GIANT(); > >>> g_topology_lock(); > >>> error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); > >> > >> g_vfs_open() already sets devvp->v_bufobj.bo_ops to g_vfs_bufops unless > >> it fails. This clobbered our setting in the buggy multiple-mount case. > >> But with multiple mounts not allowed, this cleans up any garbage in > >> v_bufobj. > > Yes, and this orders things. g_vfs_open() shoudl have devvp locked, > > both fo bo manipulations and for vnode_create_vobject() call. > > We can only assign to bo_ops after g_vfs_open() was done successfully. > > The atomic cmpset now orders things too. Is that enough? It ensures > that an old mount cannot be active. I don't know if v_bufobj is used > for non-mounts. v_bufobj is logically protected against modifications by the vnode lock. > > Except, for zfs there is no g_vfs_open() to order things, and for all > other file systems there is no atomic cmpset yet. > > >> g_vfs_open() has 2 failures for non-exclusive access. It starts by > >> checking v_bufobj.bo_private == devvp (this is after translating its > >> pointers to the ones passed here). This is avg's fix for the multiple- > >> mounts problem (r206130). It doesn't work in all cases. I think this > >> is unecessary now. > > At least it weeds out other devfs mounts. > > Yes, we need it until everything is converted. > > >> ... > >> I now see another cleanup: don't goto out when g_vfs_open() fails. This > >> depends on it setting cp to NULL and leaving nothing to clean when it > >> fails. It has no man page and this detail is documented in its source > >> code. > > Then I would need to add another NULL assignment, VOP_UNLOCK etc. > > g_vfs_open() already sets cp to NULL when it fails, and the cleanup > depends on that now, but it is just as good to depend on no cleanup > being needed on failure. You do need another dev_rel(). > > I thought about moving the dev_ref() later to simplify the early returns. > I thought that this didn't quite work. Now I think it does work, for > obvious reasons: > - the device is attached to a vnode, so it is referenced to prevent it > going away unless the device is revoked. It seems to be referenced > at least 3 times in FreeBSD-9. > - the vnode is locked, so the reference count remains > 0 until we unlock. > So we just need a dev_ref() before the unlock in the non-error case, to > keep the device from going away if it is revoked. Yes, and this is how the current patched code is structured. > > > Updated patch to add acq/rel. > > > > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c > > index 712fc21..670bb15 100644 > > --- a/sys/ufs/ffs/ffs_vfsops.c > > +++ b/sys/ufs/ffs/ffs_vfsops.c > > @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td) > > cred = td ? td->td_ucred : NOCRED; > > ronly = (mp->mnt_flag & MNT_RDONLY) != 0; > > > > + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp")); > > Hrmph. I want this, it would remove amount of obvious questions. > > > dev = devvp->v_rdev; > > dev_ref(dev); > > Move later... > > > + if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, mp) != 0) { > > I changed the first 0 to NULL, and this works on i386, but now I remember > that i386 has bogus casts which break detection of type mismatches -- > the atomic ptr functions take a [u]intptr_t, not a pointer type, so > NULL won't work if it is ((void *)0). At least amd64 is still missing > this bug. cmpset_<bar>_ptr() on i386 has cast for old and new parameters to u_int. store_rel_ptr() on i386 does not cast value to u_int. As result, NULL is acceptable for cmpset, but not for store. I spelled it 0 in all cases. Hm, I also should add uintptr_t cast for cmpset, otherwise, I suspect, some arch might be broken. > > > + dev_rel(dev); > > ...then this dev_rel() is not needed. > > > + VOP_UNLOCK(devvp, 0); > > + return (EBUSY); > > + } > > DROP_GIANT(); > > g_topology_lock(); > > error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); > > g_topology_unlock(); > > PICKUP_GIANT(); > > - VOP_UNLOCK(devvp, 0); > > - if (error) > > + if (error != 0) { > > + VOP_UNLOCK(devvp, 0); > > goto out; > > This becomes: > > if (error != 0) { > VOP_UNLOCK(devvp, 0); > return (EBUSY); > } > > Then assign v_bufobj. > > Then dev_ref(), just in time for unlocking. > > Then unlock. Ok. > > > - if (devvp->v_rdev->si_iosize_max != 0) > > - mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max; > > + } > > + if (dev->si_iosize_max != 0) > > + mp->mnt_iosize_max = dev->si_iosize_max; > > if (mp->mnt_iosize_max > MAXPHYS) > > mp->mnt_iosize_max = MAXPHYS; > > - > > devvp->v_bufobj.bo_ops = &ffs_ops; > > - if (devvp->v_type == VCHR) > > - devvp->v_rdev->si_mountpt = mp; > > + VOP_UNLOCK(devvp, 0); > > This belongs earlier. > > > > > fs = NULL; > > sblockloc = 0; > > ... > > We need this in a central function. g_vfs_open/close() can do it for > all cases except zfs. This looks like: I might look at this later. > > DROP_GIANT(); > g_topology_lock(); > // atomic_cmpset and its error = EBUSY moved to top of g_vfs_open() > error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); > g_topology_unlock(); > PICKUP_GIANT(); > if (error != 0) { > VOP_UNLOCK(devvp, 0); > return (error); > } > devvp->v_bufobj.bo_ops = &ffs_ops; > dev_ref(dev); > VOP_UNLOCK(devvp, 0); > if (dev->si_iosize_max != 0) > mp->mnt_iosize_max = dev->si_iosize_max; > if (mp->mnt_iosize_max > MAXPHYS) > mp->mnt_iosize_max = MAXPHYS; > > where 2 of 2 lines with GIANT and 3 of 4 lines with iosize_max remain to > be cleaned up. > > Resetting si_mountpt in g_vfs_close() is even simpler. Oops, it also has > to be reset in g_vfs_open() on a later failure there. diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index 712fc21..65b1891 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -764,25 +764,30 @@ ffs_mountfs(devvp, mp, td) cred = td ? td->td_ucred : NOCRED; ronly = (mp->mnt_flag & MNT_RDONLY) != 0; + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp")); dev = devvp->v_rdev; - dev_ref(dev); + if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, (uintptr_t)mp) != 0) { + VOP_UNLOCK(devvp, 0); + return (EBUSY); + } DROP_GIANT(); g_topology_lock(); error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1); g_topology_unlock(); PICKUP_GIANT(); + if (error != 0) { + VOP_UNLOCK(devvp, 0); + atomic_store_rel_ptr(&dev->si_mountpt, 0); + return (error); + } + dev_ref(dev); + devvp->v_bufobj.bo_ops = &ffs_ops; VOP_UNLOCK(devvp, 0); - if (error) - goto out; - if (devvp->v_rdev->si_iosize_max != 0) - mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max; + if (dev->si_iosize_max != 0) + mp->mnt_iosize_max = dev->si_iosize_max; if (mp->mnt_iosize_max > MAXPHYS) mp->mnt_iosize_max = MAXPHYS; - devvp->v_bufobj.bo_ops = &ffs_ops; - if (devvp->v_type == VCHR) - devvp->v_rdev->si_mountpt = mp; - fs = NULL; sblockloc = 0; /* @@ -1083,8 +1088,6 @@ ffs_mountfs(devvp, mp, td) out: if (bp) brelse(bp); - if (devvp->v_type == VCHR && devvp->v_rdev != NULL) - devvp->v_rdev->si_mountpt = NULL; if (cp != NULL) { DROP_GIANT(); g_topology_lock(); @@ -1102,6 +1105,7 @@ out: free(ump, M_UFSMNT); mp->mnt_data = NULL; } + atomic_store_rel_ptr(&dev->si_mountpt, 0); dev_rel(dev); return (error); } @@ -1287,8 +1291,7 @@ ffs_unmount(mp, mntflags) g_vfs_close(ump->um_cp); g_topology_unlock(); PICKUP_GIANT(); - if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL) - ump->um_devvp->v_rdev->si_mountpt = NULL; + atomic_store_rel_ptr(&ump->um_dev->si_mountpt, 0); vrele(ump->um_devvp); dev_rel(ump->um_dev); mtx_destroy(UFS_MTX(ump));
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160520142654.GW89104>