Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Jun 2010 09:00:54 -0500
From:      "Rick C. Petty" <rick-freebsd2009@kiwi-computer.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Why is NFSv4 so slow?
Message-ID:  <20100628140054.GA52174@kay.kiwi-computer.com>
In-Reply-To: <Pine.GSO.4.63.1006280017190.2680@muncher.cs.uoguelph.ca>
References:  <20100627221607.GA31646@kay.kiwi-computer.com> <Pine.GSO.4.63.1006271949220.3233@muncher.cs.uoguelph.ca> <20100628031401.GA45282@kay.kiwi-computer.com> <Pine.GSO.4.63.1006280017190.2680@muncher.cs.uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 28, 2010 at 12:30:30AM -0400, Rick Macklem wrote:
> 
> I can't explain the corruption, beyond the fact that "soft,intr" can
> cause all sorts of grief. If mounts without "soft,intr" still show
> corruption problems, try disabling delegations (either kill off the
> nfscbd daemons on the client or set vfs.newnfs.issue_delegations=0
> on the server). It is disabled by default because it is the "greenest"
> part of the subsystem.

I tried without soft,intr and "make buildworld" failed with what looks like
file corruption again.  I'm trying without delegations now.

> Make sure you don't have multiple entries for the same uid, such as "root"
> and "toor" both for uid 0 in your /etc/passwd. (ie. get rid of one of 
> them, if you have both)

Hmm, that's a strange requirement, since FreeBSD by default comes with
both.  That should probably be documented in the nfsv4 man page.

> When you specify "nfs" for an NFSv3 mount, you get the regular client.
> When you specify "newnfs" for an NFSv3 mount, you get the experimental
> client. When you specify "nfsv4" you always get the experimental NFS
> client, and it doesn't matter which FStype you've specified.

Ok.  So my comparison was with the regular and experimental clients.

> If you are using UFS/FFS on the server, this should work and I don't know
> why the empty directories under /vol on the client confused it. If your
> server is using ZFS, everything from / including /vol need to be exported.

Nope, UFS2 only (on both clients and server).

> >	kernel: nfsv4 client/server protocol prob err=10020
> 
> This error indicates that there wasn't a valid FH for the server. I
> suspect that the mount failed. (It does a loop of Lookups from "/" in
> the kernel during the mount and it somehow got confused part way through.)

If the mount failed, why would it allow me to "ls /vol/a" and see both "b"
and "c" directories as well as other files/directories on /vol/ ?

> I don't know why these empty dirs would confuse it. I'll try a test
> here, but I suspect the real problem was that the mount failed and
> then happened to succeed after you deleted the empty dirs.

It doesn't seem likely.  I spent an hour mounting and unmounting and each
mount looked successful in that there were files and directories besides
the two I was trying to decend into.

> It still smells like some sort of transport/net interface/... issue
> is at the bottom of this. (see response to your next post)

It's possible.  I just had another NFSv4 client (with the same server) lock
up:

load: 0.00  cmd: ls 17410 [nfsv4lck] 641.87r 0.00u 0.00s 0% 1512k

and:

load: 0.00  cmd: make 87546 [wait] 37095.09r 0.01u 0.01s 0% 844k

That make has been hung for hours, and the ls(1) was executed during that
lockup.  I wish there was a way I could unhang these processes and unmount
the NFS mount without panicking the kernel, but alas even this fails:

# umount -f /sw
load: 0.00  cmd: umount 17479 [nfsclumnt] 1.27r 0.00u 0.04s 0% 788k

A "shutdown -p now" resulted in a panic with the speaker beeping
constantly and no console output.

It's possible the NICs are all suspect, but all of this worked fine a
couple of days ago when I was only using NFSv3.

-- Rick C. Petty



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100628140054.GA52174>