Date: Mon, 29 Jun 2009 10:38:43 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Attilio Rao <attilio@freebsd.org> Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation Message-ID: <Pine.GSO.4.63.0906291018050.3493@muncher.cs.uoguelph.ca> In-Reply-To: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> References: <Pine.GSO.4.63.0906281955160.5084@muncher.cs.uoguelph.ca> <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 29 Jun 2009, Attilio Rao wrote: > 2009/6/29 Rick Macklem <rmacklem@uoguelph.ca>: >> I just noticed that when I do the following: >> - start a large write to an NFS mounted fs >> - network partition the server (unplug a net cable) >> - do a "umount -f <mntpoint>" on the machine >> >> that it gets stuck trying to write dirty blocks to the server. >> >> I had, in the past, assumed that a "umount -f" of an NFS mount would be >> used to get rid of an NFS mount on an unresponsive server and that loss >> of "writes in progress" would be expected to happen. >> >> Does that sound correct? (In other words, an I seeing a bug or a feature?) > > While that should be real in principle (immediate shutdown of the fs > operation and unmounting of the partition) it is totally impossible to > have it completely unsleeping, so it can happen that also umount -f > sleeps / delays for some times (example: vflush). > Currently, umount -f is one of the most complicated thing to handle in > our VFS because it puts as requirement that vnodes can be reclaimed in > any moment, adding complexity and possibility for races. > Yes, agreed. And I like to leave that stuff to more clever chaps than I:-) > What's the fix for your problem? > Well, when I tested it I found that it got stuck in two places, both calls to VFS_SYNC(). The first was a sync(); right at the beginning of umount.c. - All I did for that one is move it to after the code that handles option processing and change it to if ((fflag & MNT_FORCE) == 0) sync(); so that it isn't done for the "-f" case. (I believe the sync(); call at the beginning of umount is only a performance optimization, so I don't think not doing it for "-f" should break anything.) - the second happened just before the VFS_UNMOUNT() call in the umount(2) system call. The code looks like: if (((mp->mnt_flag & MNT_RDONLY) || (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0) - Although it was tempting to reverse the order of VFS_SYNC() and the test for MNT_FORCE, I thought that might have a negative impact on other file systems, since it avoided doing the VFS_SYNC(), so... - Instead, I just put a check for MNTK_UNMOUNTF at the beginning of nfs_sync(), so that it returns EBUSY for this case instead of getting stuck trying to flush(). Assuming that I'm right w.r.t. the "sync();" at the beginning of umount.c, it simply ensures that the umount command thread makes it as far as VFS_UNMOUNT()->nfs_unmount(), so that the forced dismount proceeds. It kills RPCs in progress before doing the vflush() and, since no new RPCs can be done once MNTK_UNMOUNTF is set (it is checked at the beginning of a request), the vflush() won't actually flush anything to the server. As such, "umount -f" is pretty well guaranteed to throw away the dirty buffers. I believe this is correct behaviour, but it would mean that a user/sysadmin that uses "umount -f" for cases where the server is still functioning, but slow, will lose data when they probably don't expect to. Does this help? rick ps: During simple testing, it has worked ok. It waits about 1 minute for the RPC threads to shut down, but the "umount -f" does complete after that happens. It the consensus seems to be that patching this is a good idea, I'll get some more testing done.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.0906291018050.3493>