From owner-freebsd-fs@FreeBSD.ORG Fri Jun 20 21:11:03 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 75E40550 for ; Fri, 20 Jun 2014 21:11:03 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 3D0A92BC1 for ; Fri, 20 Jun 2014 21:11:02 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av4EAMqepFODaFve/2dsb2JhbABZg19agm2nMwEBAQEBAQaRa4ZsUwGBJHWEAwEBAQMBAQEBIAQnIAsFFg4KAgINGQIpAQkmBggHBAEcBIgZCA2rbp48F4EqhDiDYIRIFQYBARs0B4J3gUwEl1+EKJIXg14hNX0IFyI X-IronPort-AV: E=Sophos;i="5.01,516,1400040000"; d="scan'208";a="132543151" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 20 Jun 2014 17:11:01 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 2CACBB4039; Fri, 20 Jun 2014 17:11:01 -0400 (EDT) Date: Fri, 20 Jun 2014 17:11:01 -0400 (EDT) From: Rick Macklem To: Daniel Mayfield Message-ID: <373087919.2114818.1403298661172.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: Debugging newnfs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jun 2014 21:11:03 -0000 Daniel Mayfield wrote: > > > The server side is a set of vlans on a lagg of 4 igbs. I think igb net interfaces have a limit of 64 transmit segments (IGB_MAX_SCATTER), so they should be ok with TSO enabled. > The Xen side > is the same setup, with the VMs in question attached to two > different vlans. > Well, from what I know, using lagg on top of a Xen/netfront net device will definitely be a problem, unless you have r265290 and r265412. (Without these patches, the setting of if_hw_tsomax done by Xen's netfront is not propagated up to tcp_output(). The same statements apply to if_vlan.c, with the patch r265291.) I know nothing about Xen, so I have no idea if you are using the Xen/netfront virtual net driver, but using lagg and/or vlan on top of it is definitely broken without the recent patches. If you can disable TSO, that will be a workaround for this. > > Many different mounts, but the mount options all look like this: > > > > nfsv3,tcp,resvport,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=4048762,timeout=120,retrans=2 > > > The permissions do not change, but repeat operations succeed and fail > randomly. > > > > There aren't any clients concurrently accessing the same mount. > > > > > > > On Fri, Jun 20, 2014 at 9:16 AM, Rick Macklem < rmacklem@uoguelph.ca > > wrote: > > > > > Daniel Mayfield wrote: > > I have a very strange problem between an NFS server running FreeBSD > > 10 w/ ZFS and a number of FreeBSD 10 VMs running on a XenServer 6.2 > > SP1 host. The problem manifests as seemingly random permissions > > issues and/or IO errors on the clients when the ZFS pool is busy. > > There are no entries in dmesg on either side, and no errors logged > > in nfsstat either. If I keep the traffic down, the errors subside, > > but not completely. Other than tcpdump, how can I go about > > debugging this? > > > Well, you didn't mention what mount options you are using or what > network interfaces that you are using, but here's a few things that > might be worth looking at... > > The TSO max transmit segments issue: > - Without going into all the details (there have been some recent > commits like r264630 to try and alleviate this), if a net device > driver cannot handle 35 mbufs in a transmit TSO segment, things > will get broken. > - Xen/netfront is a weird exception, which I think is ok so long > as lagg or a vlan isn't layered on top of it. > --> If can disable TSO on both server and clients or reduce > rsize,wsize > to 32K on all client mounts and see if the problem persists, that > is probably the best way to check this. (Since Xen/netfront is > such a weird case, I am not 100% sure if doing the above will fix > this problem, if it is being used) > > I also don't know if it is possible to have corrupted packets due to > a hardware problem (bad memory or...) where the Xen/netfront world > doesn't catch it. > > If you use the "soft" mount option, you could easily get this when > the server is slow to respond. I'd strongly recommend using "tcp" > and not "soft" for your mounts. ("nfsstat -m" on the client will > show you what the actual mount options is use are. This can be > somewhat different than what is specified on the command line, since > servers limit rsize/wsize, as an example.) > > When you get a "permissions failure" case, check on the server to > see if the permissions for the file appear correct on ZFS. If they > are (or the problem disappears when you retry a command without > changing permissions), you could have a caching issue. Other than > capturing the packets and looking at them in wireshark (which knows > NFS, unlike tcpdump) all you can do is try fiddling with the mount > options related to caching and see if that helps. (Note that NFS > does not have a cache coherency protocol, so if files are > concurrently > shared among multiple clients, all bets are off w.r.t. what the > behaviour is. jhb@ is much better at this than I, since he seems > to find lots of these weird cases at his workplace.) > > Good luck with it, rick > > > Dan > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to " > > freebsd-fs-unsubscribe@freebsd.org " > > > >