Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Jan 2012 18:49:23 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        svn-src-head@FreeBSD.org, Rick Macklem <rmacklem@FreeBSD.org>, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: svn commit: r230516 - in head/sys: fs/nfsclient nfsclient
Message-ID:  <919199278.155166.1327535363109.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20120125152150.M1522@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote:
> On Tue, 24 Jan 2012, Rick Macklem wrote:
> 
> > Bruce Evans wrote:
> >> On Wed, 25 Jan 2012, Rick Macklem wrote:
> >>
> >>> Log:
> >>>  If a mount -u is done to either NFS client that switches it
> >>>  from TCP to UDP and the rsize/wsize/readdirsize is greater
> >>>  than NFS_MAXDGRAMDATA, it is possible for a thread doing an
> >>>  I/O RPC to get stuck repeatedly doing retries. This happens
> >>>  ...
> 
> >> Could it wait for the old i/o to complete (and not start any new
> >> i/o?). This is little different from having to wait when changing
> >> from rw to ro. The latter is not easy, and at least the old nfs
> >> client seems to not even dream of it. ffs has always called a
> >> ...
> 
> > As you said above "not easy ... uses complicated suspension of i/o".
> > I have not tried to code this, but I think it would be non-trivial.
> > The code would need to block new I/O before RPCs are issued and wait
> > for all in-progress I/Os to complete. At this time, the kernel RPC
> > handles the in-progress RPCs and NFS doesn't "know" what is
> > outstanding. Of course, code could be added to keep track of
> > in-progress
> > I/O RPCs, but that would have to be written, as well.
> 
> Hmm, this means that even when the i/o sizes are small, the mode
> switch
> from tcp to udp may be unsafe since there may still be i/o's with
> higher
> sizes outstanding. So to switch from tcp to udp, the user should first
> reduce the sizes, when wait a while before switching to udp. And what
> happens with retries after changing sizes up or down? Does it retry
> with the old sizes?
> 
> Bruce
Good point. I think (assuming a TCP mount with large rsize):
# mount -u -o rsize=16384 /mnt
# mount -u -o udp /mnt
- could still result in a wedged thread trying to do a read that
  is too large for UDP.

I'll revert r230516, since it doesn't really fix the problem, it just
reduced its lieklyhood.

I'll ask on freebsd-fs@ if anyone finds switching from TCP->UDP via a
"mount -u" is useful to them. If no one thinks it's necessary, the patch
could just disallow the switch, no matter what the old rsize/wsize/readdirsize
is. Otherwise, the fix is somewhat involved and difficult for a scenario
like this, where the NFS server is network partitioned or crashed:
- sysadmin notices NFS mount is "hung" and does
  # mount -u -o udp /path
  to try and fix it, but it doesn't help
- sysadmin tries "umount -f /path" to get rid of the "hung" mount.

If "mount -u -o udp /path" is waiting for I/O ops to complete,
(which is what the somewhat involved patch would need to do) the
"umount -f /path" will get stuck waiting for the "mount -u"
which will be waiting for I/O RPCs to complete. This could
be partially fixed by making sure that the "mount -u -o udp /path" is
interruptible (via <ctrl>C), but I still don't like the idea that
"umount -f /path" won't work if "mount -u -o udp /path" is sitting in
the kernel waiting for RPCs to complete, which would need to be done
to make a TCP->UDP switch work.

rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?919199278.155166.1327535363109.JavaMail.root>