From owner-svn-src-head@FreeBSD.ORG Mon Jul 14 09:52:34 2014 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 41F10866; Mon, 14 Jul 2014 09:52:34 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 22962249E; Mon, 14 Jul 2014 09:52:34 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.8/8.14.8) with ESMTP id s6E9qY4K066461; Mon, 14 Jul 2014 09:52:34 GMT (envelope-from kib@svn.freebsd.org) Received: (from kib@localhost) by svn.freebsd.org (8.14.8/8.14.8/Submit) id s6E9qXNf066458; Mon, 14 Jul 2014 09:52:33 GMT (envelope-from kib@svn.freebsd.org) Message-Id: <201407140952.s6E9qXNf066458@svn.freebsd.org> From: Konstantin Belousov Date: Mon, 14 Jul 2014 09:52:33 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r268617 - head/sys/fs/tmpfs X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jul 2014 09:52:34 -0000 Author: kib Date: Mon Jul 14 09:52:33 2014 New Revision: 268617 URL: http://svnweb.freebsd.org/changeset/base/268617 Log: Rework the tmpfs unmount. - Suspend filesystem for unmount. This prevents new tmpfs nodes from instantiating, and also ensures that only unmount thread can destroy nodes. - Do not start tmpfs node deletion until all vnodes are reclaimed, which guarantees that no thread can access tmpfs data. For this, call vflush() in the loop, until the mnt_nvnodelistsize is non-zero. Note that after mnt_nvnodelistsize becomes 0, insmntque() blocks insertion of a vnode germ into the mount list of vnodes. - Fail node allocation when the filesystem is being unmounted. This is race-free due to the vflush() call in loop. This is mostly cosmetic, avoiding some more work which might be done until suspension in unmount is started. Note that there is currently no way to prevent new vnode instantiation from readers during the unmount. Due to this, forced unmount might live-lock if vflush() loop cannot get to the zero vnode count due to races with readers. The unmount would proceed after the load is lifted. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Modified: head/sys/fs/tmpfs/tmpfs.h head/sys/fs/tmpfs/tmpfs_subr.c head/sys/fs/tmpfs/tmpfs_vfsops.c Modified: head/sys/fs/tmpfs/tmpfs.h ============================================================================== --- head/sys/fs/tmpfs/tmpfs.h Mon Jul 14 09:35:14 2014 (r268616) +++ head/sys/fs/tmpfs/tmpfs.h Mon Jul 14 09:52:33 2014 (r268617) @@ -384,7 +384,7 @@ struct tmpfs_fid { * Prototypes for tmpfs_subr.c. */ -int tmpfs_alloc_node(struct tmpfs_mount *, enum vtype, +int tmpfs_alloc_node(struct mount *mp, struct tmpfs_mount *, enum vtype, uid_t uid, gid_t gid, mode_t mode, struct tmpfs_node *, char *, dev_t, struct tmpfs_node **); void tmpfs_free_node(struct tmpfs_mount *, struct tmpfs_node *); Modified: head/sys/fs/tmpfs/tmpfs_subr.c ============================================================================== --- head/sys/fs/tmpfs/tmpfs_subr.c Mon Jul 14 09:35:14 2014 (r268616) +++ head/sys/fs/tmpfs/tmpfs_subr.c Mon Jul 14 09:52:33 2014 (r268617) @@ -159,7 +159,7 @@ tmpfs_pages_check_avail(struct tmpfs_mou * Returns zero on success or an appropriate error code on failure. */ int -tmpfs_alloc_node(struct tmpfs_mount *tmp, enum vtype type, +tmpfs_alloc_node(struct mount *mp, struct tmpfs_mount *tmp, enum vtype type, uid_t uid, gid_t gid, mode_t mode, struct tmpfs_node *parent, char *target, dev_t rdev, struct tmpfs_node **node) { @@ -169,6 +169,8 @@ tmpfs_alloc_node(struct tmpfs_mount *tmp /* If the root directory of the 'tmp' file system is not yet * allocated, this must be the request to do it. */ MPASS(IMPLIES(tmp->tm_root == NULL, parent == NULL && type == VDIR)); + KASSERT(tmp->tm_root == NULL || mp->mnt_writeopcount > 0, + ("creating node not under vn_start_write")); MPASS(IFF(type == VLNK, target != NULL)); MPASS(IFF(type == VBLK || type == VCHR, rdev != VNOVAL)); @@ -178,6 +180,24 @@ tmpfs_alloc_node(struct tmpfs_mount *tmp if (tmpfs_pages_check_avail(tmp, 1) == 0) return (ENOSPC); + if ((mp->mnt_kern_flag & MNTK_UNMOUNT) != 0) { + /* + * When a new tmpfs node is created for fully + * constructed mount point, there must be a parent + * node, which vnode is locked exclusively. As + * consequence, if the unmount is executing in + * parallel, vflush() cannot reclaim the parent vnode. + * Due to this, the check for MNTK_UNMOUNT flag is not + * racy: if we did not see MNTK_UNMOUNT flag, then tmp + * cannot be destroyed until node construction is + * finished and the parent vnode unlocked. + * + * Tmpfs does not need to instantiate new nodes during + * unmount. + */ + return (EBUSY); + } + nnode = (struct tmpfs_node *)uma_zalloc_arg( tmp->tm_node_pool, tmp, M_WAITOK); @@ -687,7 +707,8 @@ tmpfs_alloc_file(struct vnode *dvp, stru parent = NULL; /* Allocate a node that represents the new file. */ - error = tmpfs_alloc_node(tmp, vap->va_type, cnp->cn_cred->cr_uid, + error = tmpfs_alloc_node(dvp->v_mount, tmp, vap->va_type, + cnp->cn_cred->cr_uid, dnode->tn_gid, vap->va_mode, parent, target, vap->va_rdev, &node); if (error != 0) return (error); Modified: head/sys/fs/tmpfs/tmpfs_vfsops.c ============================================================================== --- head/sys/fs/tmpfs/tmpfs_vfsops.c Mon Jul 14 09:35:14 2014 (r268616) +++ head/sys/fs/tmpfs/tmpfs_vfsops.c Mon Jul 14 09:52:33 2014 (r268617) @@ -238,7 +238,7 @@ tmpfs_mount(struct mount *mp) tmp->tm_ronly = (mp->mnt_flag & MNT_RDONLY) != 0; /* Allocate the root node. */ - error = tmpfs_alloc_node(tmp, VDIR, root_uid, + error = tmpfs_alloc_node(mp, tmp, VDIR, root_uid, root_gid, root_mode & ALLPERMS, NULL, NULL, VNOVAL, &root); @@ -269,38 +269,49 @@ tmpfs_mount(struct mount *mp) static int tmpfs_unmount(struct mount *mp, int mntflags) { - int error; - int flags = 0; struct tmpfs_mount *tmp; struct tmpfs_node *node; + int error, flags; - /* Handle forced unmounts. */ - if (mntflags & MNT_FORCE) - flags |= FORCECLOSE; - - /* Finalize all pending I/O. */ - error = vflush(mp, 0, flags, curthread); - if (error != 0) - return error; - + flags = (mntflags & MNT_FORCE) != 0 ? FORCECLOSE : 0; tmp = VFS_TO_TMPFS(mp); - /* Free all associated data. The loop iterates over the linked list - * we have containing all used nodes. For each of them that is - * a directory, we free all its directory entries. Note that after - * freeing a node, it will automatically go to the available list, - * so we will later have to iterate over it to release its items. */ - node = LIST_FIRST(&tmp->tm_nodes_used); - while (node != NULL) { - struct tmpfs_node *next; + /* Stop writers */ + error = vfs_write_suspend_umnt(mp); + if (error != 0) + return (error); + /* + * At this point, nodes cannot be destroyed by any other + * thread because write suspension is started. + */ + + for (;;) { + error = vflush(mp, 0, flags, curthread); + if (error != 0) { + vfs_write_resume(mp, VR_START_WRITE); + return (error); + } + MNT_ILOCK(mp); + if (mp->mnt_nvnodelistsize == 0) { + MNT_IUNLOCK(mp); + break; + } + MNT_IUNLOCK(mp); + if ((mntflags & MNT_FORCE) == 0) { + vfs_write_resume(mp, VR_START_WRITE); + return (EBUSY); + } + } + TMPFS_LOCK(tmp); + while ((node = LIST_FIRST(&tmp->tm_nodes_used)) != NULL) { + TMPFS_UNLOCK(tmp); if (node->tn_type == VDIR) tmpfs_dir_destroy(tmp, node); - - next = LIST_NEXT(node, tn_entries); tmpfs_free_node(tmp, node); - node = next; + TMPFS_LOCK(tmp); } + TMPFS_UNLOCK(tmp); uma_zdestroy(tmp->tm_dirent_pool); uma_zdestroy(tmp->tm_node_pool); @@ -313,11 +324,13 @@ tmpfs_unmount(struct mount *mp, int mntf /* Throw away the tmpfs_mount structure. */ free(mp->mnt_data, M_TMPFSMNT); mp->mnt_data = NULL; + vfs_write_resume(mp, VR_START_WRITE); MNT_ILOCK(mp); mp->mnt_flag &= ~MNT_LOCAL; MNT_IUNLOCK(mp); - return 0; + + return (0); } static int @@ -401,6 +414,18 @@ tmpfs_statfs(struct mount *mp, struct st return 0; } +static int +tmpfs_sync(struct mount *mp, int waitfor) +{ + + if (waitfor == MNT_SUSPEND) { + MNT_ILOCK(mp); + mp->mnt_kern_flag |= MNTK_SUSPEND2 | MNTK_SUSPENDED; + MNT_IUNLOCK(mp); + } + return (0); +} + /* * tmpfs vfs operations. */ @@ -411,5 +436,6 @@ struct vfsops tmpfs_vfsops = { .vfs_root = tmpfs_root, .vfs_statfs = tmpfs_statfs, .vfs_fhtovp = tmpfs_fhtovp, + .vfs_sync = tmpfs_sync, }; VFS_SET(tmpfs_vfsops, tmpfs, VFCF_JAIL);