Date: Mon, 28 Jun 2010 00:30:30 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: "Rick C. Petty" <rick-freebsd2009@kiwi-computer.com> Cc: freebsd-stable@freebsd.org Subject: Re: Why is NFSv4 so slow? Message-ID: <Pine.GSO.4.63.1006280017190.2680@muncher.cs.uoguelph.ca> In-Reply-To: <20100628031401.GA45282@kay.kiwi-computer.com> References: <20100627221607.GA31646@kay.kiwi-computer.com> <Pine.GSO.4.63.1006271949220.3233@muncher.cs.uoguelph.ca> <20100628031401.GA45282@kay.kiwi-computer.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 27 Jun 2010, Rick C. Petty wrote: > > Hmm. When I mounted the same filesystem with nfs3 from a different client, > everything started working at almost normal speed (still a little slower > though). > > Now on that same host I saw a file get corrupted. On the server, I see > the following: > > % hd testfile | tail -4 > 00677fd0 2a 24 cc 43 03 90 ad e2 9a 4a 01 d9 c4 6a f7 14 |*$.C.....J...j..| > 00677fe0 3f ba 01 77 28 4f 0f 58 1a 21 67 c5 73 1e 4f 54 |?..w(O.X.!g.s.OT| > 00677ff0 bf 75 59 05 52 54 07 6f db 62 d6 4a 78 e8 3e 2b |.uY.RT.o.b.Jx.>+| > 00678000 > > But on the client I see this: > > % hd testfile | tail -4 > 00011ff0 1e af dc 8e d6 73 67 a2 cd 93 fe cb 7e a4 dd 83 |.....sg.....~...| > 00012000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 00678000 > > The only thing I could do to fix it was to copy the file on the server, > delete the original file on the client, and move the copied file back. > > Not only is it affecting random file reads, but started breaking src > and ports builds in random places. In one situation, portmaster failed > because of a port checksum. It then tried to refetch and failed with the > same checksum problem. I manually deleted the file, tried again and it > built just fine. The ports tree and distfiles are nfs4 mounted. > I can't explain the corruption, beyond the fact that "soft,intr" can cause all sorts of grief. If mounts without "soft,intr" still show corruption problems, try disabling delegations (either kill off the nfscbd daemons on the client or set vfs.newnfs.issue_delegations=0 on the server). It is disabled by default because it is the "greenest" part of the subsystem. >> The other thing that can really slow it down is if the uid<->login-name >> (and/or gid<->group-name) is messed up, but this would normally only >> show up for things like "ls -l". (Beware having multiple password database >> entries for the same uid, such as "root" and "toor".) > > I use the same UIDs/GIDs on all my boxes, so that can't be it. But thanks > for the idea. > Make sure you don't have multiple entries for the same uid, such as "root" and "toor" both for uid 0 in your /etc/passwd. (ie. get rid of one of them, if you have both) > >> When you did the nfs3 mount did you specify "newnfs" or "nfs" for the >> file system type? (I'm wondering if you still saw the problem with the >> regular "nfs" client against the server? Others have had good luck using >> the server for NFSv3 mounts.) > > I used "nfs" for FStype. So I should be using "newnfs"? This wasn't very > clear in the man pages. In fact "newnfs" wasn't mentioned in > "man mount_newnfs". > When you specify "nfs" for an NFSv3 mount, you get the regular client. When you specify "newnfs" for an NFSv3 mount, you get the experimental client. When you specify "nfsv4" you always get the experimental NFS client, and it doesn't matter which FStype you've specified. > > One other thing I noticed but I'm not sure if it's a bug or expected > behavior (unrelated to the delays or corruption), is I have the following > filesystems on the server: > > /vol/a > /vol/a/b > /vol/a/c > > I export all three volumes and set my NFS V4 root to "/". On the client, > I'll "mount ... server:vol /vol" and the "b" and "c" directories show up > but when I try "ls /vol/a/b /vol/a/c", they show up empty. In dmesg I see: > If you are using UFS/FFS on the server, this should work and I don't know why the empty directories under /vol on the client confused it. If your server is using ZFS, everything from / including /vol need to be exported. > kernel: nfsv4 client/server protocol prob err=10020 > This error indicates that there wasn't a valid FH for the server. I suspect that the mount failed. (It does a loop of Lookups from "/" in the kernel during the mount and it somehow got confused part way through.) > After unmounting /vol, I discovered that my client already had /vol/a/b and > /vol/a/c directories (because pre-NFSv4, I had to mount each filesystem > separately). Once I removed those empty dirs and remounted, the problem > went away. But it did drive me crazy for a few hours. > I don't know why these empty dirs would confuse it. I'll try a test here, but I suspect the real problem was that the mount failed and then happened to succeed after you deleted the empty dirs. It still smells like some sort of transport/net interface/... issue is at the bottom of this. (see response to your next post) rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.1006280017190.2680>