Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 31 May 2011 16:15:29 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, svn-src-head@FreeBSD.org, Robert Watson <rwatson@FreeBSD.org>, Kostik Belousov <kostikbel@gmail.com>, Rick Macklem <rmacklem@FreeBSD.org>
Subject:   Re: svn commit: r222466 - head/sbin/umount
Message-ID:  <1326411567.44367.1306872929534.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110531170853.Y1153@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> 
> It is only an optimization. Any number of syncs are useless for
> actually syncing the system, since sync(2) only does an async sync (it
> returns without waiting for most writes to complete). As you pointed
> out later in this thread, unmount(2) does a sync that works -- a sync
> sync -- before doing the unmount proper. It does this irrespective of
> the force flag:
> 
> % if (((mp->mnt_flag & MNT_RDONLY) ||
> % (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0)
> % error = VFS_UNMOUNT(mp, flags);
> 
> The force flag doesn't mean that i/o's are forced to complete. It
> only means that open files are forced to be closed (and a few related
> things). This can be seen here. We wait (potentially forever) for
> any existing writes to complete. Only then do we look at the force
> flag and bale out if the sync failed and the force flag is _not_ set.
> Most i/o problems will cause hangs in this sync, and the force flag
> won't help. But if we somehow get past this sync, and have i/o
> problems, then honoring the force flag and continuing gives even more
> fragility, since we have almost passed the point of being able to give
> up after an error (this point is typically very early in
> VFS_UNMOUNT()).
> 
Well, in vgonel(), which is called from vflush() and seems to be what
gets rid of the vnode even when open for forced dismounts, there is:

    /*
     * Clear out any buffers associated with the vnode.
     * If the flush fails, just toss the buffers.
     */
    - followed shortly by
    if (vinvalbuf(vp, V_SAVE, 0, 0) != 0)
             vinvalbuf(vp, 0, 0, 0);

It seems to me that this code is willing to give up when it can't write
dirty buffers, but???

To me, a forced dismount of an NFS mount that just wedges on an
unresponsive server isn't very useful. The only case I can think
of where a forced dismount that isn't willing to throw away dirty
buffers might be useful would be when an app. (userland process)
is for some reason unkillable and is sitting with an open NFS file.
Is this a problem in practice? (I do know that a mount stuck on an
unresponsive server is a real problem. There have been several PRs,
the most recent being kern/157365.)

I brought up the concept of a "hard forced dismount" at the developer
summit a year ago, but no one seemed to see a difference between that
and what they believed a forced dismount already did. After that, I
thought about it a little and couldn't see much use for a "soft forced
dismount that fails instead of throwing away dirty buffers, but does
close down open files".

- For example, suppose a userland process is writing a file through
  the stdio library:
  A - fopen()s a file for writing
  B - does fwrite()s of small amounts of data
  C - does an fflush()

If the open is closed by a forced dismount before C, the written data
will be lost, since it isn't even in the kernel yet. If the file is on
a responsive NFS server, it will be written to the server shortly after C.
So, there seems to be a narrow window around C where the data will be lost
for a forced dismount that throws away dirty buffers vs one that doesn't
"but only if the server is responsive".

In other words, "Is there much difference between ripping the open files
out from under a app. before it does a write syscall vs throwing the data
away when it's in the kernel buffer but not yet written to an NFS server?"

Anyhow, the way I now have the NFS clients, they do a "hard forced dismount"
where they can throw away dirty buffers not yet written to the server.
Recent patches to nfs_sync() make it return without attempting writes for
the forced dismount case and the vgonel() { see above code snippet } throws
away the data when the first vinvalbuf(.., V_SAVE,.) fails.

> Many file systems have had bugs in this area. Before 1999/01/22
> (vfs_bio.c 1.196), b*write() gave up after an i/o error, and pretended
> to succeed. Both recoverable and unrecoverable errors were mishandled.
> This avoided many hangs in VFS_SYNC(). When this was partially fixed,
> many VFS_SYNC()s couldn't handle the errors, and paniced when they
> should have looped endlessly. Now I think they mostly loop endlessly,
> even for unrecoverable errors when they shouldn't. My version attempts
> to fix vfs_bio.c 1.196 by still giving up after an unrecoverable i/o
> error. This reduced endless loops but gave more panics in other places
> that can't handle the errors, mainly in the VFS_UMOUNT() call in the
> above -- many file systems saw the errors after the point where it is
> posible to either recover from them or loop endlessly on them, and
> then
> a panic occurred soon after the VFS_UMOUNT() returned with a
> half-complete
> unmount(), since neither a success or a failure return can indicate
> this
> state.
> 
> > call to the unmount(2) call, I'm not convinced it makes a
> > significant
> > difference. (I thought of just getting rid of it, but figured it was
> > harmless for the non "-f" case and might matter for a buggy fs that
> > doesn't
> > get the unmount(2) quite right. ie. Same argument as doing the
> > triple-sync,
> > just to be sure.)
> 
> I think you shouldn't have touched umount(8). The sync can still hang
> or
> just be in progress when unmount(2) is called, and unmount(2) still
> does
> its own sync, so nfs_unmount() must still handle hanging syncs in some
> way.
> 
> My reboot script has more details related to this:
> 
> %%%
> sh <<EOF
> sync
> #umount -Af -t nfs
> /sbin/reboot
> EOF
> %%%
> 
> This used to unmount the nfs file systems explicitly because I
> _wanted_
> their unmounts to hang if there is a problem (hopefullly while they
> could
> be aborted by ^C). Otherwise, I wanted the reboot to be fairly
> forceful
> (no messing around with shutdown or even umount). reboot(8) and
> reboot(2)
> do a fairly clean shutdown, including unmounting all file systems with
> the force flag set (in vfs_unmountall(9)), although their man pages
> don't
> mention unmounting. reboot(8) doesn't even mention reboot(2)!,
> although
> it bogusly references sync(8). reboot(8) actually uses sync(2) (unless
> nflag), but this is fairly bogus too, since reboot(2) does another
> sync
> (unless RB_NOSYNC). Perhaps there was a problem with the force flag
> being
> too forceful especially for nfs. vfs_bio.c 1.196 might have fixed
> this.
> 
> Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1326411567.44367.1306872929534.JavaMail.root>