From owner-freebsd-fs@FreeBSD.ORG Fri Jun 20 13:17:14 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 31E2C543 for ; Fri, 20 Jun 2014 13:17:14 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id ED48E2067 for ; Fri, 20 Jun 2014 13:17:13 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av4EAPEypFODaFve/2dsb2JhbABZg19agm2nMgEBAQEBAQaRa4ZsUwGBHHWEAwEBAQMBAQEBIAQnIAsFFg4KAgINGQIpAQkmBggHBAEcBIgZCA2sSp48F4EqhDiDYIRdBgEBGzQHgneBTASXX4QokheDXiE1fQgXIg X-IronPort-AV: E=Sophos;i="5.01,514,1400040000"; d="scan'208";a="132343521" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 20 Jun 2014 09:16:04 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id CAA4DB3F23; Fri, 20 Jun 2014 09:16:04 -0400 (EDT) Date: Fri, 20 Jun 2014 09:16:04 -0400 (EDT) From: Rick Macklem To: Daniel Mayfield Message-ID: <538359689.1860054.1403270164794.JavaMail.root@uoguelph.ca> In-Reply-To: <0016EC7C-7DCC-47B4-AD12-798525045F89@3geeks.org> Subject: Re: Debugging newnfs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.209] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jun 2014 13:17:14 -0000 Daniel Mayfield wrote: > I have a very strange problem between an NFS server running FreeBSD > 10 w/ ZFS and a number of FreeBSD 10 VMs running on a XenServer 6.2 > SP1 host. The problem manifests as seemingly random permissions > issues and/or IO errors on the clients when the ZFS pool is busy. > There are no entries in dmesg on either side, and no errors logged > in nfsstat either. If I keep the traffic down, the errors subside, > but not completely. Other than tcpdump, how can I go about > debugging this? > Well, you didn't mention what mount options you are using or what network interfaces that you are using, but here's a few things that might be worth looking at... The TSO max transmit segments issue: - Without going into all the details (there have been some recent commits like r264630 to try and alleviate this), if a net device driver cannot handle 35 mbufs in a transmit TSO segment, things will get broken. - Xen/netfront is a weird exception, which I think is ok so long as lagg or a vlan isn't layered on top of it. --> If can disable TSO on both server and clients or reduce rsize,wsize to 32K on all client mounts and see if the problem persists, that is probably the best way to check this. (Since Xen/netfront is such a weird case, I am not 100% sure if doing the above will fix this problem, if it is being used) I also don't know if it is possible to have corrupted packets due to a hardware problem (bad memory or...) where the Xen/netfront world doesn't catch it. If you use the "soft" mount option, you could easily get this when the server is slow to respond. I'd strongly recommend using "tcp" and not "soft" for your mounts. ("nfsstat -m" on the client will show you what the actual mount options is use are. This can be somewhat different than what is specified on the command line, since servers limit rsize/wsize, as an example.) When you get a "permissions failure" case, check on the server to see if the permissions for the file appear correct on ZFS. If they are (or the problem disappears when you retry a command without changing permissions), you could have a caching issue. Other than capturing the packets and looking at them in wireshark (which knows NFS, unlike tcpdump) all you can do is try fiddling with the mount options related to caching and see if that helps. (Note that NFS does not have a cache coherency protocol, so if files are concurrently shared among multiple clients, all bets are off w.r.t. what the behaviour is. jhb@ is much better at this than I, since he seems to find lots of these weird cases at his workplace.) Good luck with it, rick > Dan > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >