Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jun 2014 09:16:04 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Daniel Mayfield <dan@3geeks.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Debugging newnfs
Message-ID:  <538359689.1860054.1403270164794.JavaMail.root@uoguelph.ca>
In-Reply-To: <0016EC7C-7DCC-47B4-AD12-798525045F89@3geeks.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Daniel Mayfield wrote:
> I have a very strange problem between an NFS server running FreeBSD
> 10 w/ ZFS and a number of FreeBSD 10 VMs running on a XenServer 6.2
> SP1 host.  The problem manifests as seemingly random permissions
> issues and/or IO errors on the clients when the ZFS pool is busy.
>  There are no entries in dmesg on either side, and no errors logged
> in nfsstat either.  If I keep the traffic down, the errors subside,
> but not completely.  Other than tcpdump, how can I go about
> debugging this?
> 
Well, you didn't mention what mount options you are using or what
network interfaces that you are using, but here's a few things that
might be worth looking at...

The TSO max transmit segments issue:
- Without going into all the details (there have been some recent
  commits like r264630 to try and alleviate this), if a net device
  driver cannot handle 35 mbufs in a transmit TSO segment, things
  will get broken.
  - Xen/netfront is a weird exception, which I think is ok so long
    as lagg or a vlan isn't layered on top of it.
--> If can disable TSO on both server and clients or reduce rsize,wsize
    to 32K on all client mounts and see if the problem persists, that
    is probably the best way to check this. (Since Xen/netfront is
    such a weird case, I am not 100% sure if doing the above will fix
    this problem, if it is being used)

I also don't know if it is possible to have corrupted packets due to
a hardware problem (bad memory or...) where the Xen/netfront world
doesn't catch it.

If you use the "soft" mount option, you could easily get this when
the server is slow to respond. I'd strongly recommend using "tcp"
and not "soft" for your mounts. ("nfsstat -m" on the client will
show you what the actual mount options is use are. This can be
somewhat different than what is specified on the command line, since
servers limit rsize/wsize, as an example.)

When you get a "permissions failure" case, check on the server to
see if the permissions for the file appear correct on ZFS. If they
are (or the problem disappears when you retry a command without
changing permissions), you could have a caching issue. Other than
capturing the packets and looking at them in wireshark (which knows
NFS, unlike tcpdump) all you can do is try fiddling with the mount
options related to caching and see if that helps. (Note that NFS
does not have a cache coherency protocol, so if files are concurrently
shared among multiple clients, all bets are off w.r.t. what the
behaviour is. jhb@ is much better at this than I, since he seems
to find lots of these weird cases at his workplace.)

Good luck with it, rick

> Dan
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?538359689.1860054.1403270164794.JavaMail.root>