Date: Fri, 20 Jun 2014 10:58:39 -0400 From: Daniel Mayfield <dan@3geeks.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: freebsd-fs@freebsd.org Subject: Re: Debugging newnfs Message-ID: <CAE=e2zwZqPoCs17rkKCXt2B4aj4SG7tCEe29Khjf_kV%2BLrM%2BsQ@mail.gmail.com> In-Reply-To: <538359689.1860054.1403270164794.JavaMail.root@uoguelph.ca> References: <0016EC7C-7DCC-47B4-AD12-798525045F89@3geeks.org> <538359689.1860054.1403270164794.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
The server side is a set of vlans on a lagg of 4 igbs. The Xen side is the same setup, with the VMs in question attached to two different vlans. Many different mounts, but the mount options all look like this: nfsv3,tcp,resvport,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=4048762,timeout=120,retrans=2 The permissions do not change, but repeat operations succeed and fail randomly. There aren't any clients concurrently accessing the same mount. On Fri, Jun 20, 2014 at 9:16 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Daniel Mayfield wrote: > > I have a very strange problem between an NFS server running FreeBSD > > 10 w/ ZFS and a number of FreeBSD 10 VMs running on a XenServer 6.2 > > SP1 host. The problem manifests as seemingly random permissions > > issues and/or IO errors on the clients when the ZFS pool is busy. > > There are no entries in dmesg on either side, and no errors logged > > in nfsstat either. If I keep the traffic down, the errors subside, > > but not completely. Other than tcpdump, how can I go about > > debugging this? > > > Well, you didn't mention what mount options you are using or what > network interfaces that you are using, but here's a few things that > might be worth looking at... > > The TSO max transmit segments issue: > - Without going into all the details (there have been some recent > commits like r264630 to try and alleviate this), if a net device > driver cannot handle 35 mbufs in a transmit TSO segment, things > will get broken. > - Xen/netfront is a weird exception, which I think is ok so long > as lagg or a vlan isn't layered on top of it. > --> If can disable TSO on both server and clients or reduce rsize,wsize > to 32K on all client mounts and see if the problem persists, that > is probably the best way to check this. (Since Xen/netfront is > such a weird case, I am not 100% sure if doing the above will fix > this problem, if it is being used) > > I also don't know if it is possible to have corrupted packets due to > a hardware problem (bad memory or...) where the Xen/netfront world > doesn't catch it. > > If you use the "soft" mount option, you could easily get this when > the server is slow to respond. I'd strongly recommend using "tcp" > and not "soft" for your mounts. ("nfsstat -m" on the client will > show you what the actual mount options is use are. This can be > somewhat different than what is specified on the command line, since > servers limit rsize/wsize, as an example.) > > When you get a "permissions failure" case, check on the server to > see if the permissions for the file appear correct on ZFS. If they > are (or the problem disappears when you retry a command without > changing permissions), you could have a caching issue. Other than > capturing the packets and looking at them in wireshark (which knows > NFS, unlike tcpdump) all you can do is try fiddling with the mount > options related to caching and see if that helps. (Note that NFS > does not have a cache coherency protocol, so if files are concurrently > shared among multiple clients, all bets are off w.r.t. what the > behaviour is. jhb@ is much better at this than I, since he seems > to find lots of these weird cases at his workplace.) > > Good luck with it, rick > > > Dan > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAE=e2zwZqPoCs17rkKCXt2B4aj4SG7tCEe29Khjf_kV%2BLrM%2BsQ>