Date: Tue, 30 Jun 2009 03:53:09 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Attilio Rao <attilio@freebsd.org>, freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation Message-ID: <20090630010035.E37426@delplex.bde.org> In-Reply-To: <Pine.GSO.4.63.0906291018050.3493@muncher.cs.uoguelph.ca> References: <Pine.GSO.4.63.0906281955160.5084@muncher.cs.uoguelph.ca> <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> <Pine.GSO.4.63.0906291018050.3493@muncher.cs.uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 29 Jun 2009, Rick Macklem wrote: > On Mon, 29 Jun 2009, Attilio Rao wrote: >> >> While that should be real in principle (immediate shutdown of the fs >> operation and unmounting of the partition) it is totally impossible to >> have it completely unsleeping, so it can happen that also umount -f >> sleeps / delays for some times (example: vflush). >> Currently, umount -f is one of the most complicated thing to handle in >> our VFS because it puts as requirement that vnodes can be reclaimed in >> any moment, adding complexity and possibility for races. >> > Yes, agreed. And I like to leave that stuff to more clever chaps than I:-) > >> What's the fix for your problem? >> > Well, when I tested it I found that it got stuck in two places, both > calls to VFS_SYNC(). The first was a > sync(); > right at the beginning of umount.c. > - All I did for that one is move it to after the code that handles > option processing and change it to > if ((fflag & MNT_FORCE) == 0) > sync(); > so that it isn't done for the "-f" case. (I believe the sync(); call > at the beginning of umount is only a performance optimization, so I > don't think not doing it for "-f" should break anything.) OK. This sync() is probably actually a performance pessimization, since it syncs all file systems while the internal sync in umount(2) only syncs the one being unmounted. > - the second happened just before the VFS_UNMOUNT() call in the > umount(2) system call. The code looks like: > if (((mp->mnt_flag & MNT_RDONLY) || > (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != > 0) > - Although it was tempting to reverse the order of VFS_SYNC() and the > test for MNT_FORCE, I thought that might have a negative impact on > other file systems, since it avoided doing the VFS_SYNC(), so... > > - Instead, I just put a check for MNTK_UNMOUNTF at the beginning of > nfs_sync(), so that it returns EBUSY for this case instead of getting > stuck trying to flush(). OK. This sync is probably an optimization for correctness, since it arranges to do as much as possible without forcing. I checked ffs_mount() and found 2 large bugs, one related: - in the only case that tends to cause problems, namely the non-readonly case, ffs_unmount() does a suspend which calls VOP_SYNC(..., MNT_SUSPEND), but after errors from this sync it checks neither MNT_FORCE nor error == ENXIO. I think the usual effect is the same as if the top-level unmount() didn't check MNT_FORCE after suspend failure: in problematic cases, we have an unrecoverable write, due to the device going away or just an i/o error, and this error has probably already occured (only in rare cases will it be triggered by unmount). Then MNT_FORCE is essentially unused, and the ENXIO hack is not reached either, and unmount usually fails. - the UFS_EXTATTR case destroys infrastructure before committing to succeeding. It used to be just broken on failure. Now it uses a hack to recover (call a constructor) on failure, but the recovery code is not reached in the usual case of failure -- when the suspension fails. ffs_unmount() still seems to have no support for handling unrecoverable write errors (short of you converting them to ENXIO by removing the media). MNT_FORCE only meant FORCECLOSE for it. I see that old nfs was similar, and you are now making MNT_FORCE stronger. I thought that umount(8)'s man page documented -f being strongly forceful, but checking it shows that it only documents a weak force like that of FORCECLOSE (but not precisely enough). Perhaps a different flag should be used for strong forcefulness. Weak forcefulness is still useful and used for mount -f -u -- for remount we would never want errors in the file system itself ignored. This use also shows that the generic FORCECLOSE code must not ignore errors. > Assuming that I'm right w.r.t. the "sync();" at the beginning of umount.c, > it simply ensures that the umount command thread makes it as far as > VFS_UNMOUNT()->nfs_unmount(), so that the forced dismount proceeds. It > kills RPCs in progress before doing the vflush() and, since no new RPCs > can be done once MNTK_UNMOUNTF is set (it is checked at the beginning of > a request), the vflush() won't actually flush anything to the server. > > As such, "umount -f" is pretty well guaranteed to throw away the dirty > buffers. I believe this is correct behaviour, This is how I think ffs_mount() should work too -- It should be responsible for throwing away the dirty buffers, while nothing else should discard them. Now the discarding seems to be done by falling through to g_vfs_done(), except g_vfs_done() is not reached in most cases (see above). I don't like this -- at best we lose the opportunity to print ffs-specific details about what was discarded. Falling through only works for ENXIO anyway -- on other errors we should discard the unwritable buffers in an fs-specific manner so as to write as many of the writable buffers as possible. > but it would mean that a > user/sysadmin that uses "umount -f" for cases where the server is still > functioning, but slow, will lose data when they probably don't expect to. A new flag would help for this. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090630010035.E37426>