From owner-freebsd-fs@FreeBSD.ORG Wed Oct 5 10:23:18 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CCCD5106566B; Wed, 5 Oct 2011 10:23:18 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [70.36.157.235]) by mx1.freebsd.org (Postfix) with ESMTP id ADE6B8FC16; Wed, 5 Oct 2011 10:23:18 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id p95ANJr5005769; Wed, 5 Oct 2011 03:23:19 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201110051023.p95ANJr5005769@chez.mckusick.com> To: Kostik Belousov In-reply-to: <20111003203811.GA1511@deviant.kiev.zoral.com.ua> Date: Wed, 05 Oct 2011 03:23:19 -0700 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: Attilio Rao , Garrett Cooper , Xin LI , freebsd-fs@freebsd.org Subject: Re: Need to force sync(2) before umounting UFS1 filesystems? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Oct 2011 10:23:19 -0000 > Date: Mon, 3 Oct 2011 23:38:11 +0300 > From: Kostik Belousov > To: Attilio Rao > Cc: Kirk McKusick , > Garrett Cooper , > freebsd-fs@freebsd.org, Xin LI > Subject: Re: Need to force sync(2) before umounting UFS1 filesystems? > > On Sun, Oct 02, 2011 at 02:19:32AM +0200, Attilio Rao wrote: > > I'm sorry if it wasn't clear in kib/my latest message, but we don't > > need the coveredvnode unlocking logic because of the tegge's commit. > > > > I just think we should commit the change in policy Kirk initially > > submitted + a comment on top of vfs_busy() explaining why the deadlock > > with coveredvnode cannot happen. > > Below is my take on the comment. Sorry for going silent. I have just finished my travel to the EuroBSD conference which is no small journey from Berkeley California :-) I believe that your comment below nicely summarizes Tor's commit log entry. It also convinces me that my original change is indeed correct. Looked at another way, if my original fix was not correct then the forced unmount case would also have been incorrect. So, I am glad that we did this exercise. Please go ahead and commit the comment (and MFC it to at least 9 if not 8 and 7 to which I believe it also applies). I am awaiting further testing of my change from Garrett Cooper before proceeding with checking it in to head. Kirk McKusick =-=-= commit 3981acdadcf4313dbdf813ec107f7bfbb4057d09 Author: Konstantin Belousov Date: Mon Oct 3 23:33:06 2011 +0300 Move parts of the commit log for r166167, where Tor explained the interaction between vnode locks and vfs_busy(), into comment. diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 7eb619a..3d7735d 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -348,6 +348,38 @@ SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_FIRST, vntblinit, NU= LL); /* * Mark a mount point as busy. Used to synchronize access and to delay * unmounting. Eventually, mountlist_mtx is not released on failure. + * + * vfs_busy() is a custom lock, it can block the caller. + * vfs_busy() only sleeps if the unmount is active on the mount point. + * For a mountpoint mp, vfs_busy-enforced lock is before lock of any + * vnode belonging to mp. + * + * Lookup uses vfs_busy() to traverse mount points. + * root fs var fs + * / vnode lock A / vnode lock (/var) D + * /var vnode lock B /log vnode lock(/var/log) E + * vfs_busy lock C vfs_busy lock F + * + * Within each file system, the lock order is C->A->B and F->D->E. + * + * When traversing across mounts, the system follows that lock order: + * + * C->A->B + * | + * +->F->D->E + * + * The lookup() process for namei("/var") illustrates the process: + * VOP_LOOKUP() obtains B while A is held + * vfs_busy() obtains a shared lock on F while A and B are held + * vput() releases lock on B + * vput() releases lock on A + * VFS_ROOT() obtains lock on D while shared lock on F is held + * vfs_unbusy() releases shared lock on F + * vn_lock() obtains lock on deadfs vnode vp_crossmp instead of A. + * Attempt to lock A (instead of vp_crossmp) while D is held would + * violate the global order, causing deadlocks. + * + * dounmount() locks B while F is drained. */ int vfs_busy(struct mount *mp, int flags)