From owner-freebsd-current@FreeBSD.ORG Sat Apr 13 03:16:50 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8B160D31; Sat, 13 Apr 2013 03:16:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 91574C1D; Sat, 13 Apr 2013 03:16:49 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAKfMaFGDaFvO/2dsb2JhbABQFoMmgy6+NoEjdIIfAQEEASNWGw4KAgINGQJZBhOIDgYMqgWSOoEjjEp2NAeCLoETA5cCgSGPcIMnIIFs X-IronPort-AV: E=Sophos;i="4.87,466,1363147200"; d="scan'208";a="23721802" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 12 Apr 2013 23:16:43 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 2C725B4035; Fri, 12 Apr 2013 23:16:43 -0400 (EDT) Date: Fri, 12 Apr 2013 23:16:43 -0400 (EDT) From: Rick Macklem To: Baptiste Daroussin Message-ID: <1786547565.799492.1365823003138.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20130412131037.GI95891@ithaqua.etoilebsd.net> Subject: Re: newnfs pkgng database corruption? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Lars Eggert , current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Apr 2013 03:16:50 -0000 Baptiste Daroussin wrote: > On Fri, Apr 12, 2013 at 12:56:10PM +0000, Eggert, Lars wrote: > > Hi, > > > > On Apr 12, 2013, at 1:10, Rick Macklem wrote: > > > Well, I have no idea why an NFS server would reply errno 70 if the > > > file > > > still exists, unless the client has somehow sent a bogus file > > > handle > > > to the server. (I am not aware of any client bug that might do > > > that. I > > > am almost suspicious that there might be a memory problem or > > > something > > > that corrupts bits in the network layer. Do you have TSO enabled > > > for your > > > network interface by any chance? If so, I'd try disabling that on > > > the > > > network interface. Same goes for checksum offload.) > > > > > > rick > > > ps: If you can capture packets between the client and server at > > > the > > > time this error occurs, looking at them in wireshark might be > > > useful? > > > > I will try all of those things. > > You might still try the above suggestions, but since Error 70 wasn't an errno.h error number, it isn't a stale fh problem and, as such, there isn't any evidence that bits are getting messed with by the network layers. rick > > But first, a question that someone who understands pkgng will be > > able to answerr: Is this "fake-pkg" process even running on the NFS > > mount? The WRKDIR is /tmp, which is an mfs mount. > > fake-pkg is run in WRKDIR, but it calls pkgng which will open > /var/db/pkg/local.sqlite aka nfs mount. > > The Error 70 is EX_SOFTWARE returned by pkgng. > > Can you try the following patch: > http://people.freebsd.org/~bapt/patch-libpkg__pkgdb.c > > Just add that file to /usr/ports/ports-mgmt/pkg/files/ > > If that works for you, that means the posix advisory locks is somehow > failing on > nfsv4 files. > > Given it is already known to be failing on nfsv3 (because people often > misconfigure it) I'll probablmy make unix-dotfile the default locking > system > when local.sqlite is stored on network filesystem. > > regards, > Bapt