From owner-freebsd-current@FreeBSD.ORG Mon Jun 29 14:36:28 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CD02A1065695; Mon, 29 Jun 2009 14:36:28 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 6B17F8FC26; Mon, 29 Jun 2009 14:36:28 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAC9uSEqDaFvK/2dsb2JhbADONYI2AYFWBQ X-IronPort-AV: E=Sophos;i="4.42,309,1243828800"; d="scan'208";a="37743957" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 29 Jun 2009 10:36:27 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 8CC22109C25E; Mon, 29 Jun 2009 10:36:27 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UGfVInBG8Jlz; Mon, 29 Jun 2009 10:36:27 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id EFE00109C257; Mon, 29 Jun 2009 10:36:26 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n5TEchd07187; Mon, 29 Jun 2009 10:38:43 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Mon, 29 Jun 2009 10:38:43 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Attilio Rao In-Reply-To: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> Message-ID: References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2009 14:36:29 -0000 On Mon, 29 Jun 2009, Attilio Rao wrote: > 2009/6/29 Rick Macklem : >> I just noticed that when I do the following: >> - start a large write to an NFS mounted fs >> - network partition the server (unplug a net cable) >> - do a "umount -f " on the machine >> >> that it gets stuck trying to write dirty blocks to the server. >> >> I had, in the past, assumed that a "umount -f" of an NFS mount would be >> used to get rid of an NFS mount on an unresponsive server and that loss >> of "writes in progress" would be expected to happen. >> >> Does that sound correct? (In other words, an I seeing a bug or a feature?) > > While that should be real in principle (immediate shutdown of the fs > operation and unmounting of the partition) it is totally impossible to > have it completely unsleeping, so it can happen that also umount -f > sleeps / delays for some times (example: vflush). > Currently, umount -f is one of the most complicated thing to handle in > our VFS because it puts as requirement that vnodes can be reclaimed in > any moment, adding complexity and possibility for races. > Yes, agreed. And I like to leave that stuff to more clever chaps than I:-) > What's the fix for your problem? > Well, when I tested it I found that it got stuck in two places, both calls to VFS_SYNC(). The first was a sync(); right at the beginning of umount.c. - All I did for that one is move it to after the code that handles option processing and change it to if ((fflag & MNT_FORCE) == 0) sync(); so that it isn't done for the "-f" case. (I believe the sync(); call at the beginning of umount is only a performance optimization, so I don't think not doing it for "-f" should break anything.) - the second happened just before the VFS_UNMOUNT() call in the umount(2) system call. The code looks like: if (((mp->mnt_flag & MNT_RDONLY) || (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0) - Although it was tempting to reverse the order of VFS_SYNC() and the test for MNT_FORCE, I thought that might have a negative impact on other file systems, since it avoided doing the VFS_SYNC(), so... - Instead, I just put a check for MNTK_UNMOUNTF at the beginning of nfs_sync(), so that it returns EBUSY for this case instead of getting stuck trying to flush(). Assuming that I'm right w.r.t. the "sync();" at the beginning of umount.c, it simply ensures that the umount command thread makes it as far as VFS_UNMOUNT()->nfs_unmount(), so that the forced dismount proceeds. It kills RPCs in progress before doing the vflush() and, since no new RPCs can be done once MNTK_UNMOUNTF is set (it is checked at the beginning of a request), the vflush() won't actually flush anything to the server. As such, "umount -f" is pretty well guaranteed to throw away the dirty buffers. I believe this is correct behaviour, but it would mean that a user/sysadmin that uses "umount -f" for cases where the server is still functioning, but slow, will lose data when they probably don't expect to. Does this help? rick ps: During simple testing, it has worked ok. It waits about 1 minute for the RPC threads to shut down, but the "umount -f" does complete after that happens. It the consensus seems to be that patching this is a good idea, I'll get some more testing done.