From owner-freebsd-current@FreeBSD.ORG Thu Nov 15 12:39:27 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EFC5616A473 for ; Thu, 15 Nov 2007 12:39:27 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id A491A13C4C6 for ; Thu, 15 Nov 2007 12:39:27 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 932E546EFD; Thu, 15 Nov 2007 07:41:18 -0500 (EST) Date: Thu, 15 Nov 2007 12:39:22 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Adam McDougall In-Reply-To: <20071115074247.GQ37473@egr.msu.edu> Message-ID: <20071115123543.H82897@fledge.watson.org> References: <20071115074247.GQ37473@egr.msu.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: tss@iki.fi, freebsd-current@freebsd.org Subject: Re: link() not increasing link count on NFS server X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Nov 2007 12:39:28 -0000 On Thu, 15 Nov 2007, Adam McDougall wrote: > Hi, lately I've been trying to work out some NFS multiple access issues > relating to the Dovecot IMAP server software. One symptom seems to be an > unusual behavior of FreeBSD NFS clients that I cannot reproduce with Linux > or Solaris NFS clients. Basically, Timo (cc'ed) came up with a small test > case that seems to indicate sometimes a link() call can succeed while the > link count of the file will not increase. If this is ran on two FreeBSD > clients from the same NFS directory, you will occasionally see "link() > succeeded, but link count=1". I've tried both a Netapp and a FreeBSD NFS > server. I've tried FreeBSD 7_RELENG clients as well as FreeBSD 6.2-stable > from this summer. I've ran it on 32bit and 64bit clients. I've turned > rpc.lockd on and off, tried tcp vs. udp mounts, nothing so far seems to make > a difference, except perhaps FreeBSD 7.0 seems to produce the error less > often. If one of the processes is ran on a non-FreeBSD NFS cliemt, only the > FreeBSD NFS client gives the link error. Anyone have any input? Thanks. The usual next step in debugging an NFS client problem, if you have managed to identify a nice test case, is to analyze the wire RPCs to see what's actually going on. In this case, using NFS over UDP is actually a bit easier to deal with. Wireshark has an excellent NFS RPC decoder, so if you grap the packets directly with Wireshark, or with tcpdump and then load then in Wireshark, it may shed some light. Ideally, we'd get the test case down to maybe four to eight RPCs and their replies -- a GETATTR at the start (stat the file to check the link count), LINK and its reply, and a GETATTR at the end (stat the file to check the link count). You will probably enter up with a smattering of LOOKUP and possibly ACCESS calls mixed in. My guess, and this is just a hand-wave, is that the attribute cache in the NFS client isn't being forced to refresh, and hence you're getting the old stat data back (and perhaps there's no GETATTR on the wire, which might hint at this). If you'd like, you can post a link to the pcap capture file and one of us can take a look, but I've found NFS RPCs to be surprisingly readable in Wireshark so you might find it sheds quite a bit of light. I assume, btw, that if you stat the file directly on the server, or from another client, both links show the right link count? Robert N M Watson Computer Laboratory University of Cambridge > > > How to reproduce (local binary is fine too, may be required if different arch): > ------------------ > > cp locktest.c /nfsserver > cd /nfsserver > gcc locktest.c -o locktest -Wall -g > > On host 1: > cd /nfsserver > ./locktest temp1 > > On host 2: (easiest to reproduce when starting just a few seconds after 1) > cd /nfsserver > ./locktest temp2 > > > Typical output (timing may vary): > ---------------------------------- > > Host 1: > >> /tmp/locktest temp1 > 5 successes > 15 successes > unlink(): No such file or directory (not a problem indication, happens > 19 successes when second process starts) > 20 successes > link() succeeded, but link count=1 > 20 successes > link() succeeded, but link count=1 > 20 successes > 33 successes > 33 successes > link() succeeded, but link count=1 > 33 successes > 45 successes > link() succeeded, but link count=1 > 45 successes > 45 successes > link() succeeded, but link count=1 > ^C > > Host 2: > >> /tmp/locktest temp2 > 6 successes > 15 successes > 25 successes > 38 successes > 39 successes > 50 successes > 59 successes > link() succeeded, but link count=1 > 59 successes > 69 successes > 79 successes > 91 successes > 99 successes > 109 successes > ^C > >