From owner-freebsd-current@FreeBSD.ORG  Mon Jun 29 14:36:28 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CD02A1065695;
	Mon, 29 Jun 2009 14:36:28 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 6B17F8FC26;
	Mon, 29 Jun 2009 14:36:28 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApoEAC9uSEqDaFvK/2dsb2JhbADONYI2AYFWBQ
X-IronPort-AV: E=Sophos;i="4.42,309,1243828800"; d="scan'208";a="37743957"
Received: from fraser.cs.uoguelph.ca ([131.104.91.202])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 29 Jun 2009 10:36:27 -0400
Received: from localhost (localhost.localdomain [127.0.0.1])
	by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 8CC22109C25E;
	Mon, 29 Jun 2009 10:36:27 -0400 (EDT)
X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca
Received: from fraser.cs.uoguelph.ca ([127.0.0.1])
	by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id UGfVInBG8Jlz; Mon, 29 Jun 2009 10:36:27 -0400 (EDT)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102])
	by fraser.cs.uoguelph.ca (Postfix) with ESMTP id EFE00109C257;
	Mon, 29 Jun 2009 10:36:26 -0400 (EDT)
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	n5TEchd07187; Mon, 29 Jun 2009 10:38:43 -0400 (EDT)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Mon, 29 Jun 2009 10:38:43 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher.cs.uoguelph.ca
To: Attilio Rao <attilio@freebsd.org>
In-Reply-To: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com>
Message-ID: <Pine.GSO.4.63.0906291018050.3493@muncher.cs.uoguelph.ca>
References: <Pine.GSO.4.63.0906281955160.5084@muncher.cs.uoguelph.ca>
	<3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Subject: Re: umount -f implementation
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Jun 2009 14:36:29 -0000



On Mon, 29 Jun 2009, Attilio Rao wrote:

> 2009/6/29 Rick Macklem <rmacklem@uoguelph.ca>:
>> I just noticed that when I do the following:
>> - start a large write to an NFS mounted fs
>> - network partition the server (unplug a net cable)
>> - do a "umount -f <mntpoint>" on the machine
>>
>> that it gets stuck trying to write dirty blocks to the server.
>>
>> I had, in the past, assumed that a "umount -f" of an NFS mount would be
>> used to get rid of an NFS mount on an unresponsive server and that loss
>> of "writes in progress" would be expected to happen.
>>
>> Does that sound correct? (In other words, an I seeing a bug or a feature?)
>
> While that should be real in principle (immediate shutdown of the fs
> operation and unmounting of the partition) it is totally impossible to
> have it completely unsleeping, so it can happen that also umount -f
> sleeps / delays for some times (example: vflush).
> Currently, umount -f is one of the most complicated thing to handle in
> our VFS because it puts as requirement that vnodes can be reclaimed in
> any moment, adding complexity and possibility for races.
>
Yes, agreed. And I like to leave that stuff to more clever chaps than I:-)

> What's the fix for your problem?
>
Well, when I tested it I found that it got stuck in two places, both
calls to VFS_SYNC(). The first was a
 	sync();
right at the beginning of umount.c.
- All I did for that one is move it to after the code that handles
   option processing and change it to
 	if ((fflag & MNT_FORCE) == 0)
 		sync();
   so that it isn't done for the "-f" case. (I believe the sync(); call
   at the beginning of umount is only a performance optimization, so I
   don't think not doing it for "-f" should break anything.)

- the second happened just before the VFS_UNMOUNT() call in the
   umount(2) system call. The code looks like:
 	if (((mp->mnt_flag & MNT_RDONLY) ||
 	     (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0)
   - Although it was tempting to reverse the order of VFS_SYNC() and the
     test for MNT_FORCE, I thought that might have a negative impact on
     other file systems, since it avoided doing the VFS_SYNC(), so...

   - Instead, I just put a check for MNTK_UNMOUNTF at the beginning of
     nfs_sync(), so that it returns EBUSY for this case instead of getting
     stuck trying to flush().

Assuming that I'm right w.r.t. the "sync();" at the beginning of umount.c,
it simply ensures that the umount command thread makes it as far as
VFS_UNMOUNT()->nfs_unmount(), so that the forced dismount proceeds. It
kills RPCs in progress before doing the vflush() and, since no new RPCs
can be done once MNTK_UNMOUNTF is set (it is checked at the beginning of
a request), the vflush() won't actually flush anything to the server.

As such, "umount -f" is pretty well guaranteed to throw away the dirty
buffers. I believe this is correct behaviour, but it would mean that a
user/sysadmin that uses "umount -f" for cases where the server is still
functioning, but slow, will lose data when they probably don't expect to.

Does this help? rick
ps: During simple testing, it has worked ok. It waits about 1 minute for
     the RPC threads to shut down, but the "umount -f" does complete after
     that happens. It the consensus seems to be that patching this is a
     good idea, I'll get some more testing done.