From owner-svn-src-all@FreeBSD.ORG Wed Jan 25 03:02:33 2012 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3F614106566B; Wed, 25 Jan 2012 03:02:33 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id A00148FC17; Wed, 25 Jan 2012 03:02:32 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EABFwH0+DaFvO/2dsb2JhbABChQmqNIFyAQEEASMEUhsOCgICDRkCWQaID6Z9kTyBL4dbAQYGCQUBAQIBCQIYCoJNBgIFAQUKAQMJCgEGAgJdFAaCH4EWBIg/jF6SbA X-IronPort-AV: E=Sophos;i="4.71,565,1320642000"; d="scan'208";a="153577287" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 24 Jan 2012 22:02:31 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 8E36DB3FC1; Tue, 24 Jan 2012 22:02:31 -0500 (EST) Date: Tue, 24 Jan 2012 22:02:31 -0500 (EST) From: Rick Macklem To: Bruce Evans Message-ID: <2145461054.81036.1327460551572.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20120125121210.A1045@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: svn-src-head@FreeBSD.org, Rick Macklem , svn-src-all@FreeBSD.org, src-committers@FreeBSD.org Subject: Re: svn commit: r230516 - in head/sys: fs/nfsclient nfsclient X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jan 2012 03:02:33 -0000 Bruce Evans wrote: > On Wed, 25 Jan 2012, Rick Macklem wrote: > > > Log: > > If a mount -u is done to either NFS client that switches it > > from TCP to UDP and the rsize/wsize/readdirsize is greater > > than NFS_MAXDGRAMDATA, it is possible for a thread doing an > > I/O RPC to get stuck repeatedly doing retries. This happens > > because the RPC will use a resize/wsize/readdirsize that won't > > work for UDP and, as such, it will keep failing indefinitely. > > This patch returns an error for this case, to avoid the problem. > > A discussion on freebsd-fs@ seemed to indicate that returning > > an error was preferable to silently ignoring the "udp"/"mntudp" > > option. > > Could it wait for the old i/o to complete (and not start any new > i/o?). This is little different from having to wait when changing > from rw to ro. The latter is not easy, and at least the old nfs > client seems to not even dream of it. ffs has always called a > flushfiles() function, but until relatively recently had lots of > bugs in this area. It now uses complicated suspension of i/o > to prevent starting new i/o. It is still the only file system in > FreeBSD that uses MNTK_SUSPEND* or vfs_write_suspend(). > (geom journal also uses vfs_write_suspend(), but I think > it only works with ffs. vfs_write_suspend() uses MNT_SUSPEND > in its call to VFS_SYNC(), but ffs_sync() is the only sync > function that references MNT_SUSPEND.) > I wonder how zfs handles this. > As you said above "not easy ... uses complicated suspension of i/o". I have not tried to code this, but I think it would be non-trivial. The code would need to block new I/O before RPCs are issued and wait for all in-progress I/Os to complete. At this time, the kernel RPC handles the in-progress RPCs and NFS doesn't "know" what is outstanding. Of course, code could be added to keep track of in-progress I/O RPCs, but that would have to be written, as well. One tricky case would be if a forced dismount was invoked while the above is happening. (Forced dismount is hard to get right and easy to break, from my experience with it.) I didn't feel that the above work was justified, since I didn't see a need to switch from TCP -> UDP for a mount. If others feel that returning an error isn't sufficient and the above work is justified, I can revert the patch and try and find time to do the above. > > This problem was discovered while investigating a problem reported > > by pjd@ via email. > > This problem has annoyed me for years. I now know a workaround :-). > It doesn't help that, at least without nmount(), you can't see any > of the udp/tcp, rsize or wsize parameters. The behaviour of mount -u > should probably be to not change anything that is not an explicit > parameter, but this is not traditional (mount -u > traditionally means mount -u -o rw...) and is impossible for at least > mount() to determine the current parameters, so it was passed a > complete set of defaults. So for mount -u, you have to tell it all > the current parameters that are not the default unless you want them > changed back to the default, and it is just as impossible for you as > for mount() to tell what the current parameters are. You have to > remember what you set them to. When you forgot what they were or > mistype type, you sometimes switch tcp back to udp when you meant to > keep tcp, and see the bug. The rsize and wsize parameters give the > additional problem that when switching between udp and tcp, the > defaults > for the size parameters change, so you normally don't want to keep the > old parameters; but when testing which combination works best, you > need to control the new parameters; in particular, it is useful to > know what the defaults are; but the defaults are of course only > documented in the source code, and the source code that sets the > defaults is very large. > Although I haven't coded it yet, I am planning on adding a "-m" option to nfsstat (similar to what Linux has), that would dump out the NFS mount options actually being used. This becomes more important for NFSv4.n, since the newer protocol is negotiating more things. rick