From owner-freebsd-current@FreeBSD.ORG Thu Nov 15 18:37:19 2007 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0427A16A47A for ; Thu, 15 Nov 2007 18:37:19 +0000 (UTC) (envelope-from mohan_srinivasan@yahoo.com) Received: from web31809.mail.mud.yahoo.com (web31809.mail.mud.yahoo.com [68.142.207.72]) by mx1.freebsd.org (Postfix) with SMTP id 9FF2E13C46E for ; Thu, 15 Nov 2007 18:37:18 +0000 (UTC) (envelope-from mohan_srinivasan@yahoo.com) Received: (qmail 55020 invoked by uid 60001); 15 Nov 2007 18:10:23 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Received:X-Mailer:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Message-ID; b=dQOu/4rc289IpcE6IX4yp4zaQSZ899dArJulTmtRLyYVFjbkp25SnUpQjv3giJaLVRrOdXu8jygfqb1GDuHRA3YK13cDbo24WbY6cNhcHSrcHG7ZXZ8UqN5jnq015wyytcoZ6meodn0QrMeAdoKMDyGyC3Qt/lGDrAoawM7TrsI=; Received: from [70.231.132.101] by web31809.mail.mud.yahoo.com via HTTP; Thu, 15 Nov 2007 10:10:23 PST X-Mailer: YahooMailWebService/0.7.158 Date: Thu, 15 Nov 2007 10:10:23 -0800 (PST) From: Mohan Srinivasan To: Timo Sirainen , Robert Watson In-Reply-To: <20071115135734.O82897@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Message-ID: <698405.85667.qm@web31809.mail.mud.yahoo.com> X-Mailman-Approved-At: Thu, 15 Nov 2007 18:39:31 +0000 Cc: Adam McDougall , freebsd-current@FreeBSD.org, mohans@FreeBSD.org Subject: Re: link() not increasing link count on NFS server X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Nov 2007 18:37:19 -0000 Robert The code you cite, which launches a lookup on the receipt of an EEXIST in nfs_link() is a horrible hack that needs to be removed. I always wanted to remove it but did not want to stir up controversy. The logic predates the NFS/UDP duplicate request cache, which all NFS servers will support. The NFS dupreq cache caches the replies for non-idempotent operations and will replay the cached response if a non-idenpotent operation is retransmitted. This works around spurious errors in the event the NFS response was lost, of course. The dupreq cache appeared in most NFS server implementations in late 1989. There is no justification for the logic that the FreeBSD NFS client has at the end of these ops. In fact it breaks more things that it fixes. At Yahoo!, we had a group that was doing locking by creating lockfiles and checking for the existence of these lockfiles. As you can imagine, that application broke over FreeBSD NFS. I worked around this in FreeBSD's Yahoo! implementation. I have not looked at the original link bug reported, but I would wholeheartedly endorse ripping out the "launch a lookup on a an error in these ops" in all of the NFS ops and just return the error/or success returned by the original NFS op. mohan --- On Thu, 11/15/07, Robert Watson wrote: > From: Robert Watson > Subject: Re: link() not increasing link count on NFS server > To: "Timo Sirainen" > Cc: "Adam McDougall" , freebsd-current@FreeBSD.org, mohans@FreeBSD.org > Date: Thursday, November 15, 2007, 6:05 AM > On Thu, 15 Nov 2007, Timo Sirainen wrote: > > > On Thu, 2007-11-15 at 12:39 +0000, Robert Watson > wrote: > > > >>> or Solaris NFS clients. Basically, Timo > (cc'ed) came up with a small test > >>> case that seems to indicate sometimes a link() > call can succeed while the > >>> link count of the file will not increase. If > this is ran on two FreeBSD > >>> clients from the same NFS directory, you will > occasionally see "link() > >>> succeeded, but link count=1". I've > tried both a Netapp and a FreeBSD NFS > > .. > >> My guess, and this is just a hand-wave, is that > the attribute cache in the > >> NFS client isn't being forced to refresh, and > hence you're getting the old > >> stat data back (and perhaps there's no GETATTR > on the wire, which might > >> hint at this). If you'd like, you can post a > link to the pcap capture file > >> and one of us can take a look, but I've found > NFS RPCs to be surprisingly > >> readable in Wireshark so you might find it sheds > quite a bit of light. > > > > Actually the point was that link() returns success > even though in reality it > > fails. The fstat() was just a workaround to catch this > case and treat link > > count 1 as if link() had failed with EEXIST. After > that I had no more > > problems with locking. > > > > I noticed this first because my dotlocking was failing > to lock files > > properly. I also added fchown() to flush attribute > cache after link() and > > before fstat(), it gives the same link count=1 reply. > > Indeed, and inspection of nfs_vnops.c:nfs_link(): finds: > > 1772 /* > 1773 * Kludge: Map EEXIST => 0 assuming that it > is a reply to a retry. > 1774 */ > 1775 if (error == EEXIST) > 1776 error = 0; > 1777 return (error); > > Neither Linux nor Solaris appears to have this logic in the > client. I assume > this is, as suggested, to work around UDP retransmissions > where the reply is > lost rather than the request. It appears to exist in > revision 1.1 of > nfs_vnops.c, so came in with 4.4BSD in the initial import, > but doesn't appear > in NetBSD so I'm guessing they've removed it. It > could well be we should be > doing the same. I've added Mohan to the CC line in > case he has any input on > this point. > > Robert N M Watson > Computer Laboratory > University of Cambridge