Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 05 Oct 2011 03:23:19 -0700
From:      Kirk McKusick <mckusick@mckusick.com>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        Attilio Rao <attilio@freebsd.org>, Garrett Cooper <yanegomi@gmail.com>, Xin LI <delphij@freebsd.org>, freebsd-fs@freebsd.org
Subject:   Re: Need to force sync(2) before umounting UFS1 filesystems? 
Message-ID:  <201110051023.p95ANJr5005769@chez.mckusick.com>
In-Reply-To: <20111003203811.GA1511@deviant.kiev.zoral.com.ua> 

next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Mon, 3 Oct 2011 23:38:11 +0300
> From: Kostik Belousov <kostikbel@gmail.com>
> To: Attilio Rao <attilio@freebsd.org>
> Cc: Kirk McKusick <mckusick@mckusick.com>,
>     Garrett Cooper <yanegomi@gmail.com>,
>     freebsd-fs@freebsd.org, Xin LI <delphij@freebsd.org>
> Subject: Re: Need to force sync(2) before umounting UFS1 filesystems?
> 
> On Sun, Oct 02, 2011 at 02:19:32AM +0200, Attilio Rao wrote:
> > I'm sorry if it wasn't clear in kib/my latest message, but we don't
> > need the coveredvnode unlocking logic because of the tegge's commit.
> >
> > I just think we should commit the change in policy Kirk initially
> > submitted + a comment on top of vfs_busy() explaining why the deadlock
> > with coveredvnode cannot happen.
> 
> Below is my take on the comment.

Sorry for going silent. I have just finished my travel to the EuroBSD
conference which is no small journey from Berkeley California :-)

I believe that your comment below nicely summarizes Tor's commit log
entry. It also convinces me that my original change is indeed correct.
Looked at another way, if my original fix was not correct then the
forced unmount case would also have been incorrect. So, I am glad
that we did this exercise.

Please go ahead and commit the comment (and MFC it to at least 9
if not 8 and 7 to which I believe it also applies).

I am awaiting further testing of my change from Garrett Cooper
before proceeding with checking it in to head.

	Kirk McKusick

=-=-=

commit 3981acdadcf4313dbdf813ec107f7bfbb4057d09
Author: Konstantin Belousov <kostik@pooma.home>
Date:   Mon Oct 3 23:33:06 2011 +0300

    Move parts of the commit log for r166167, where Tor explained the
    interaction between vnode locks and vfs_busy(), into comment.

diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index 7eb619a..3d7735d 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -348,6 +348,38 @@ SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_FIRST, vntblinit, NU=
LL);
 /*
  * Mark a mount point as busy. Used to synchronize access and to delay
  * unmounting. Eventually, mountlist_mtx is not released on failure.
+ *
+ * vfs_busy() is a custom lock, it can block the caller.
+ * vfs_busy() only sleeps if the unmount is active on the mount point.
+ * For a mountpoint mp, vfs_busy-enforced lock is before lock of any
+ * vnode belonging to mp.
+ *
+ * Lookup uses vfs_busy() to traverse mount points.
+ * root fs			var fs
+ * / vnode lock		A	/ vnode lock (/var)		D
+ * /var vnode lock	B	/log vnode lock(/var/log)	E
+ * vfs_busy lock	C	vfs_busy lock			F
+ *
+ * Within each file system, the lock order is C->A->B and F->D->E.
+ *
+ * When traversing across mounts, the system follows that lock order:
+ *
+ *        C->A->B
+ *		|
+ *	        +->F->D->E
+ *
+ * The lookup() process for namei("/var") illustrates the process:
+ *  VOP_LOOKUP() obtains B while A is held
+ *  vfs_busy() obtains a shared lock on F while A and B are held
+ *  vput() releases lock on B
+ *  vput() releases lock on A
+ *  VFS_ROOT() obtains lock on D while shared lock on F is held
+ *  vfs_unbusy() releases shared lock on F
+ *  vn_lock() obtains lock on deadfs vnode vp_crossmp instead of A.
+ *    Attempt to lock A (instead of vp_crossmp) while D is held would
+ *    violate the global order, causing deadlocks.
+ *
+ * dounmount() locks B while F is drained.
  */
 int
 vfs_busy(struct mount *mp, int flags)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201110051023.p95ANJr5005769>