From owner-freebsd-current@FreeBSD.ORG Fri Jan 20 00:29:01 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 19D98106564A for ; Fri, 20 Jan 2012 00:29:01 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id C60348FC0C for ; Fri, 20 Jan 2012 00:29:00 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEADK0GE+DaFvO/2dsb2JhbABDFoRup3WCA4FyAQEFI1YbGAICDRkCWQaIF6c6kViBL4dNAgQBCgMFBA4KAwEDAQEICRQJAQEBAgEBDAUEEQUBBgEBBgEFFxUBAgEBCAEBAQECBgYCBgEDAQEEAgEBAwEOBAEDAgIDBA0BAQIBBAIBAgEBBQUEAgEDAQQBBQICAQECAQEBBQYBAQEHAQECBgICAgEEAggDgUAaAgcBAQIDDQECAwEBAwIDAgMEAQSCMYEWBIg7jFySaA X-IronPort-AV: E=Sophos;i="4.71,538,1320642000"; d="scan'208";a="152908018" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 19 Jan 2012 19:28:59 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 8B46BB3F77; Thu, 19 Jan 2012 19:28:59 -0500 (EST) Date: Thu, 19 Jan 2012 19:28:59 -0500 (EST) From: Rick Macklem To: Martin Cracauer Message-ID: <1065391703.597934.1327019339553.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20120113143711.GA62486@cons.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-current@freebsd.org, Stefan Bethke Subject: Re: Data corruption over NFS in -current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jan 2012 00:29:01 -0000 Martin Cracauer wrote: > More findings. > > Reminder, with the original report I found: > - files for no reason changing ownership and group to > root/ > - data corruption as in inserting binary junk obviously from ports > - data corruption as in malformed ascii text that might be a bug I > have in my code that is only exposed in FreeBSD > > I ran the script on a Linux machine in the same situation again the > same > NFS server, it worked fine. I haven't look at blocksizes, NFS > versions etc in play yet. > > I ran with oldnfs (reboot), which showed only the third problem. > > I re-ran with newfs (reboot) which worked (all three problems absent). > > I then started building ports/land/gcc47 at the same time as I > re-started my crazy script and it too only a few seconds for an > unexpected ownership to root to occur. > > My next steps are: > - trying block sizes and other parameters, maybe use a different NFS > version with the Linux client. My NFS server is newly upgraded to > Linux kernel 3.1.5 > - running my script on a FreeBSD host with local disk to see whether > problem #3 is a general problem that appears or is exposed only on > FreeBSD > - capture tcpdump as mentioned earlier > > I will probably have to turn debug off since this script run is > dominated by system time now and gets 10x slower as it is now. > While poking around (partly related to this and partly related to the NFSv4.1 pNFS client work), I came across an ugly bug in the way the new NFS client handled "system operations". ("system operations" are mainly NFSv4 Ops that manage state, such as Renew, which renews a lease for the open/lock state. Another case of this was the NFSv3 statfs when it did a Getattr because the server did not provide post operation attributes in the reply.) It turns out that at least some Linux NFSv3 servers are in this category and the fact that Martin was doing a large number of StatFS RPCs was indeed relevent. Anyhow, the patch to fix the above seems to have resolved Martin's problem. The patch is needed for the new NFS client if you are using NFSv4 mounts or NFSv3 mounts against non-FreeBSD servers that don't provide post-op attributes in the Statfs RPC reply. (FreeBSD servers do provide post-op attributes, at least some Linux servers do not and I don't know about others. You could check by capturing the packets for a "df" and then looking at Statfs RPC reply in wireshark.) Without the patch, there will be intermittent permission failures, since the wrong credentials get used for an RPC. The patch is here and should be in head soon: http://people.freebsd.org/~rmacklem/authcred.patch Thanks go to Martin for pursuing this. rick