From owner-freebsd-fs@FreeBSD.ORG  Thu Aug 29 23:43:41 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id E6E74DFF
 for <freebsd-fs@freebsd.org>; Thu, 29 Aug 2013 23:43:41 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id AD1AF2340
 for <freebsd-fs@freebsd.org>; Thu, 29 Aug 2013 23:43:41 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ArkEABjcH1KDaFve/2dsb2JhbABaFoMmUYMnvBaBD4E3dIIkAQEEASMEUgUWDgoCAg0ZAlkGLodgBgynNJILgSmOFzQHgmiBNAOZIpA3gzwggW4
X-IronPort-AV: E=Sophos;i="4.89,986,1367985600"; d="scan'208";a="48179327"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 29 Aug 2013 19:43:34 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id CC988B3F13;
 Thu, 29 Aug 2013 19:43:34 -0400 (EDT)
Date: Thu, 29 Aug 2013 19:43:34 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <537646864.15457428.1377819814825.JavaMail.root@uoguelph.ca>
In-Reply-To: <20130829223128.GP4972@kib.kiev.ua>
Subject: Re: fixing "umount -f" for the NFS client
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: freebsd-fs <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Aug 2013 23:43:42 -0000

Kostik wrote:
> On Thu, Aug 29, 2013 at 06:21:41PM -0400, Rick Macklem wrote:
> > Kostik wrote:
> > > On Wed, Aug 28, 2013 at 08:15:27PM -0400, Rick Macklem wrote:
> > > > I've been doing a little more testing of "umount -f" for NFS
> > > > mounts and they seem to be working unless some other
> > > > process/thread
> > > > has busied the file system via vfs_busy().
> > > > 
> > > > Unfortunately, it is pretty easy to vfs_busy() the file system
> > > > by using a command like "df" that is stuck on the unresponsive
> > > > NFS server.
> > > > 
> > > > The problem seems to be that dounmount() msleep()s while
> > > > mnt_lockref != 0 before calling VFS_UNMOUNT().
> > > > 
> > > > If some call into the NFS client was done before this
> > > > while (mp->mnt_lockref) loop with msleep() in it, it
> > > > can easily kill off RPCs in progress. (It currently
> > > > does this in nfs_unmount() using the newnfs_nmcancelreqs()
> > > > call.
> > > > 
> > > > In summary:
> > > > - Would it be appropriate to add a new vfs_XXX method that
> > > >   dounmount() would call before the while() loop for the
> > > >   forced dismount case?
> > > >   (The default would be a no-op and I have no idea if any
> > > >    file system other than NFS would have a use for it?)
> > > >   Alternately, there could be a function pointer set non-NULL
> > > >   that would specifically be used by the NFS client for this.
> > > >   This would avoid adding a vfs_XXX() method, but would mean
> > > >   an NFS specific call ends up in the generic dounmount() code.
> > > > 
> > > > Anyone have comments on this?
> > > > 
> > > Yes, I do.  I agree with adding the pre-unmount vfs method.
> > > This seems to be the cleanest solution possible.
> > > 
> > I've attached a patch. It is also at
> >   http://people.freebsd.org/~rmacklem/forced-dism.patch
> > in case the attachment gets lost.
> > I don't really like doing the MNT_IUNLOCK(), MNT_ILOCK()
> > before/after
> > the VFS_KILLIO() call, but I couldn't see any better way to do it
> > and
> > it looks safe to do so, at least for the forced case.
> Might be, call it VFS_PURGE() ?
> 
Sure, any name is fine with me.

> I suggest to move the call to the VFS_KILLIO after the MNTK_DRAINING
> is
> set, to avoid getting new references after the current i/o
> transactions
> are stopped. You would need to set MNTK_DRAINING unconditionally.
> Also,
> it probably makes sense to replace the if (mnt_lockref) with while
> ().
> 
Hmm. When I look at the code, the only use of MNTK_DRAINING seems to
be to tell vfs_unbusy() to do a wakeup() if mnt_lockref == 0. I don't
see why setting it before VFS_PURGE() would matter?

Let me explain what the NFS client does:
If is sees MNTK_UNMOUNTF set, it fails VOP/VFS calls without attempting
any RPCs. That's why I needed MNTK_UNMOUNTF set before the VFS_PURGE()
call. The VFS_PURGE() call causes any RPC that is already in progress to
fail (by closing the connection to the server). If there is a case where
an RPC attempt can get stuck after this point, it's a bug in the NFS client
I will need to find;-)
--> Once MNTK_UNMOUNTF is set and VFS_PURGE() is called, all VOP/VFS ops
    should return failure without attempting to do RPCs against the server.
If some thread does vfs_busy() while VFS_PURGE() is in progress or before
(any time MNT_ILOCK() isn't held) it should end up doing a vfs_unbusy() at
some point without getting stuck trying to do an RPC against the server.
If this happens before dounmount() re-acquires the MNT_ILOCK(), it should be ok,
since mnt_lockref has been decremented.
If it does this after the dounmount() thread re-acquires MNT_ILOCK(), dounmount()
should be in the msleep() with MNTK_DRAINING set, so it will get the wakeup
once mnt_lockref has decremented to 0.

Setting MNTK_DRAINING sooner would just result in the odd unnecessary wakeup(),
from what I can see?

> > 
> > I assume I would also need to bump __FreeBSD_version (and maybe
> > VFS_VERSION?).
> I think you could avoid it.
> 
Do you mean I don't need to bump __FreeBSD_version or VFS_VERSION or both?

Thanks, rick