Date: Fri, 20 Jun 2014 09:16:04 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Daniel Mayfield <dan@3geeks.org> Cc: freebsd-fs@freebsd.org Subject: Re: Debugging newnfs Message-ID: <538359689.1860054.1403270164794.JavaMail.root@uoguelph.ca> In-Reply-To: <0016EC7C-7DCC-47B4-AD12-798525045F89@3geeks.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Daniel Mayfield wrote: > I have a very strange problem between an NFS server running FreeBSD > 10 w/ ZFS and a number of FreeBSD 10 VMs running on a XenServer 6.2 > SP1 host. The problem manifests as seemingly random permissions > issues and/or IO errors on the clients when the ZFS pool is busy. > There are no entries in dmesg on either side, and no errors logged > in nfsstat either. If I keep the traffic down, the errors subside, > but not completely. Other than tcpdump, how can I go about > debugging this? > Well, you didn't mention what mount options you are using or what network interfaces that you are using, but here's a few things that might be worth looking at... The TSO max transmit segments issue: - Without going into all the details (there have been some recent commits like r264630 to try and alleviate this), if a net device driver cannot handle 35 mbufs in a transmit TSO segment, things will get broken. - Xen/netfront is a weird exception, which I think is ok so long as lagg or a vlan isn't layered on top of it. --> If can disable TSO on both server and clients or reduce rsize,wsize to 32K on all client mounts and see if the problem persists, that is probably the best way to check this. (Since Xen/netfront is such a weird case, I am not 100% sure if doing the above will fix this problem, if it is being used) I also don't know if it is possible to have corrupted packets due to a hardware problem (bad memory or...) where the Xen/netfront world doesn't catch it. If you use the "soft" mount option, you could easily get this when the server is slow to respond. I'd strongly recommend using "tcp" and not "soft" for your mounts. ("nfsstat -m" on the client will show you what the actual mount options is use are. This can be somewhat different than what is specified on the command line, since servers limit rsize/wsize, as an example.) When you get a "permissions failure" case, check on the server to see if the permissions for the file appear correct on ZFS. If they are (or the problem disappears when you retry a command without changing permissions), you could have a caching issue. Other than capturing the packets and looking at them in wireshark (which knows NFS, unlike tcpdump) all you can do is try fiddling with the mount options related to caching and see if that helps. (Note that NFS does not have a cache coherency protocol, so if files are concurrently shared among multiple clients, all bets are off w.r.t. what the behaviour is. jhb@ is much better at this than I, since he seems to find lots of these weird cases at his workplace.) Good luck with it, rick > Dan > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?538359689.1860054.1403270164794.JavaMail.root>