From owner-freebsd-stable@FreeBSD.ORG Mon Jun 28 14:00:56 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 79543106564A for ; Mon, 28 Jun 2010 14:00:56 +0000 (UTC) (envelope-from rick@svn.kiwi-computer.com) Received: from svn.kiwi-computer.com (174-20-59-6.mpls.qwest.net [174.20.59.6]) by mx1.freebsd.org (Postfix) with SMTP id 088768FC08 for ; Mon, 28 Jun 2010 14:00:55 +0000 (UTC) Received: (qmail 52506 invoked by uid 2000); 28 Jun 2010 14:00:54 -0000 Date: Mon, 28 Jun 2010 09:00:54 -0500 From: "Rick C. Petty" To: Rick Macklem Message-ID: <20100628140054.GA52174@kay.kiwi-computer.com> References: <20100627221607.GA31646@kay.kiwi-computer.com> <20100628031401.GA45282@kay.kiwi-computer.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: freebsd-stable@freebsd.org Subject: Re: Why is NFSv4 so slow? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd2009@kiwi-computer.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Jun 2010 14:00:56 -0000 On Mon, Jun 28, 2010 at 12:30:30AM -0400, Rick Macklem wrote: > > I can't explain the corruption, beyond the fact that "soft,intr" can > cause all sorts of grief. If mounts without "soft,intr" still show > corruption problems, try disabling delegations (either kill off the > nfscbd daemons on the client or set vfs.newnfs.issue_delegations=0 > on the server). It is disabled by default because it is the "greenest" > part of the subsystem. I tried without soft,intr and "make buildworld" failed with what looks like file corruption again. I'm trying without delegations now. > Make sure you don't have multiple entries for the same uid, such as "root" > and "toor" both for uid 0 in your /etc/passwd. (ie. get rid of one of > them, if you have both) Hmm, that's a strange requirement, since FreeBSD by default comes with both. That should probably be documented in the nfsv4 man page. > When you specify "nfs" for an NFSv3 mount, you get the regular client. > When you specify "newnfs" for an NFSv3 mount, you get the experimental > client. When you specify "nfsv4" you always get the experimental NFS > client, and it doesn't matter which FStype you've specified. Ok. So my comparison was with the regular and experimental clients. > If you are using UFS/FFS on the server, this should work and I don't know > why the empty directories under /vol on the client confused it. If your > server is using ZFS, everything from / including /vol need to be exported. Nope, UFS2 only (on both clients and server). > > kernel: nfsv4 client/server protocol prob err=10020 > > This error indicates that there wasn't a valid FH for the server. I > suspect that the mount failed. (It does a loop of Lookups from "/" in > the kernel during the mount and it somehow got confused part way through.) If the mount failed, why would it allow me to "ls /vol/a" and see both "b" and "c" directories as well as other files/directories on /vol/ ? > I don't know why these empty dirs would confuse it. I'll try a test > here, but I suspect the real problem was that the mount failed and > then happened to succeed after you deleted the empty dirs. It doesn't seem likely. I spent an hour mounting and unmounting and each mount looked successful in that there were files and directories besides the two I was trying to decend into. > It still smells like some sort of transport/net interface/... issue > is at the bottom of this. (see response to your next post) It's possible. I just had another NFSv4 client (with the same server) lock up: load: 0.00 cmd: ls 17410 [nfsv4lck] 641.87r 0.00u 0.00s 0% 1512k and: load: 0.00 cmd: make 87546 [wait] 37095.09r 0.01u 0.01s 0% 844k That make has been hung for hours, and the ls(1) was executed during that lockup. I wish there was a way I could unhang these processes and unmount the NFS mount without panicking the kernel, but alas even this fails: # umount -f /sw load: 0.00 cmd: umount 17479 [nfsclumnt] 1.27r 0.00u 0.04s 0% 788k A "shutdown -p now" resulted in a panic with the speaker beeping constantly and no console output. It's possible the NICs are all suspect, but all of this worked fine a couple of days ago when I was only using NFSv3. -- Rick C. Petty