From owner-svn-src-all@freebsd.org Thu Mar 22 23:48:08 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3B4F0F54914; Thu, 22 Mar 2018 23:48:08 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E46A372E27; Thu, 22 Mar 2018 23:48:07 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id C72B718194; Thu, 22 Mar 2018 23:48:07 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2MNm7WG095317; Thu, 22 Mar 2018 23:48:07 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2MNm7C0095316; Thu, 22 Mar 2018 23:48:07 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803222348.w2MNm7C0095316@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Thu, 22 Mar 2018 23:48:07 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org Subject: svn commit: r331389 - stable/11/cddl/contrib/opensolaris/lib/libzfs/common X-SVN-Group: stable-11 X-SVN-Commit-Author: mav X-SVN-Commit-Paths: stable/11/cddl/contrib/opensolaris/lib/libzfs/common X-SVN-Commit-Revision: 331389 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Mar 2018 23:48:08 -0000 Author: mav Date: Thu Mar 22 23:48:07 2018 New Revision: 331389 URL: https://svnweb.freebsd.org/changeset/base/331389 Log: MFC r329661: MFV r316875: 7336 vfork and O_CLOEXEC causes zfs_mount EBUSY illumos/illumos-gate@873c4903a52d089cd8234b79d24f5a3fc3bccc82 https://github.com/illumos/illumos-gate/commit/873c4903a52d089cd8234b79d24f5a3fc3bccc82 https://www.illumos.org/issues/7336 We can run into a problem where we call into zfs_mount, which in turn calls is_dir_empty, which opens the directory to try and make sure it's empty. The issue with the current approach is that it holds the directory open while it traverses it with readdir, which, due to subtle interaction with the Java JVM, vfork, and exec can cause a tricky race condition resulting in zfs_mount failures. The approach to resolving the issue in this patch is to drop the usage of readdir altogether, and instead rely on the fact that ZFS stores the number of entries contained in a directory using the st_size field of the stat structure. Thus, if the directory in question is a ZFS directory, we can check to see if it's empty by calling stat() and inspecting the st_size field of structure returned. =============================================================================== The root cause appears to be an interesting race between vfork, exec, and zfs_mount's usage of O_CLOEXEC when calling openat. Here's what is going on: 1. We call zfs_mount, and this in turn calls openat to check if the directory is empty, which results in opening the directory we're trying to mount onto, and increment v_count. 2. As we're in the middle of reading the directory, vfork is called by the JVM and proceeds to exec the jspawnhelper utility. As a result of the vfork, we take an additional hold on the directory, which increments v_count a second time. The semantics of vfork mean the parent process will wait for the child process to exit or exec before the parent can continue; at this point the parent is in the middle of zfs_mount, reading the directory to determine if it's empty or not. 3. The child process exec-ing jspawnhelper gets to the relvm call within exec_args (which is called by exec_common). relvm is the function that releases the parent process, allowing the parent to proceed. The problem is, at this point of calling relvm, the child hasn't yet called close_exec which is responsible for closing the file descriptors inherited from the parent process Reviewed by: Matt Ahrens Reviewed by: Paul Dagnelie Reviewed by: Robert Mustacchi Approved by: Dan McDonald Author: Prakash Surya Modified: stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_mount.c Directory Properties: stable/11/ (props changed) Modified: stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_mount.c ============================================================================== --- stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_mount.c Thu Mar 22 23:47:04 2018 (r331388) +++ stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_mount.c Thu Mar 22 23:48:07 2018 (r331389) @@ -78,6 +78,7 @@ #include #include #include +#include #include @@ -178,14 +179,33 @@ is_shared(libzfs_handle_t *hdl, const char *mountpoint } #ifdef illumos -/* - * Returns true if the specified directory is empty. If we can't open the - * directory at all, return true so that the mount can fail with a more - * informative error message. - */ static boolean_t -dir_is_empty(const char *dirname) +dir_is_empty_stat(const char *dirname) { + struct stat st; + + /* + * We only want to return false if the given path is a non empty + * directory, all other errors are handled elsewhere. + */ + if (stat(dirname, &st) < 0 || !S_ISDIR(st.st_mode)) { + return (B_TRUE); + } + + /* + * An empty directory will still have two entries in it, one + * entry for each of "." and "..". + */ + if (st.st_size > 2) { + return (B_FALSE); + } + + return (B_TRUE); +} + +static boolean_t +dir_is_empty_readdir(const char *dirname) +{ DIR *dirp; struct dirent64 *dp; int dirfd; @@ -211,6 +231,42 @@ dir_is_empty(const char *dirname) (void) closedir(dirp); return (B_TRUE); +} + +/* + * Returns true if the specified directory is empty. If we can't open the + * directory at all, return true so that the mount can fail with a more + * informative error message. + */ +static boolean_t +dir_is_empty(const char *dirname) +{ + struct statvfs64 st; + + /* + * If the statvfs call fails or the filesystem is not a ZFS + * filesystem, fall back to the slow path which uses readdir. + */ + if ((statvfs64(dirname, &st) != 0) || + (strcmp(st.f_basetype, "zfs") != 0)) { + return (dir_is_empty_readdir(dirname)); + } + + /* + * At this point, we know the provided path is on a ZFS + * filesystem, so we can use stat instead of readdir to + * determine if the directory is empty or not. We try to avoid + * using readdir because that requires opening "dirname"; this + * open file descriptor can potentially end up in a child + * process if there's a concurrent fork, thus preventing the + * zfs_mount() from otherwise succeeding (the open file + * descriptor inherited by the child process will cause the + * parent's mount to fail with EBUSY). The performance + * implications of replacing the open, read, and close with a + * single stat is nice; but is not the main motivation for the + * added complexity. + */ + return (dir_is_empty_stat(dirname)); } #endif