Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 06 Feb 2010 21:20:42 -0700 (MST)
From:      "M. Warner Losh" <imp@bsdimp.com>
To:        julian@elischer.org
Cc:        net@freebsd.org
Subject:   Re: How does rpc.lockd know where to send a request
Message-ID:  <20100206.212042.925196285631243946.imp@bsdimp.com>
In-Reply-To: <4B6E2B40.1070405@elischer.org>
References:  <20100206.191153.401093655925072575.imp@bsdimp.com> <4B6E2B40.1070405@elischer.org>

next in thread | previous in thread | raw e-mail | index | archive | help
In message: <4B6E2B40.1070405@elischer.org>
            Julian Elischer <julian@elischer.org> writes:
: M. Warner Losh wrote:
: > I have a problem.  All systems are running freebsd-current form
: > sometime in the last month, although similar systems running
: > 8.0-RELEASE exhibit exactly the same problem.  rpc.lockd on an NFS
: > client is doing something that baffles my mind entirely, maybe you can
: > help.  Please bear with me, this is a little complicated, but I wanted
: > to include all the details.
: > I have a host, let's call it dune.  dune is at 10.0.0.5.  dune is also
: > the master for the carp interface 10.0.0.99.  It is running rpc.lockd
: > and is an nfs server.  I've told nfs, rpcbind, lockd and statd to only
: > listen on address 10.0.0.99.
: > I have a second host.  maud-dib is 10.0.0.8.  I do "mount
: > 10.0.0.99:/dune /dune" on maud-dib.  Wireshark shows all the traffic
: > going to 10.0.0.99.  All is happy in the world.  When I start, there's
: > no ARP entry for 10.0.0.5 on 10.0.0.8, nor is there after the mount.
: > Until I do the following 'lockf /dune/imp/junk ls' (I have write perms
: > to /dune/imp).  At this point, rpc.lockd hangs.  I get the message
: > "10.0.0.99:/dune: lockd not responding" which seems odd.  lockd is
: > really there.  However, wireshark shows the NLM traffic going to IP
: > address 10.0.0.5.  maud-dib has no carp interfaces.
: > That's odd.  So my question is 'how does lockd know where to go to
: > talk the NLM protocol?'
: > 
: 
: my recollection is that maud-dib will sent an initial packet to dune
: and dune will respond but that the response may come from 10.0.0.5,
: after which maud-dib will redirect all requests there, which will not
: work because dune is not listenning there.
: 
: teh problem is that dune's daemon is setting a local address of
: IPADDR_ANY (0.0.0.0) which tells the packets to use a from
: address that is the address ofthe interface that they exit from.
: 
: Since 10.0.0.5 is the primary address on that interface, that gets
: selected.
: you may try some trickery where you add the .5 address AFTER the .99
: address so that the .99 is the primary address.

Actually, it looks like this is getting returned, as a ASCII string
'10.0.0.5' in frame 68 in response to the GETADDR call.  Since I've
told it specifically '-h 10.0.0.99' I'd have thought it would respect
that.  Since it is supposed to be bound to 10.0.0.99, I'd proffer the
argument this is a bug in rpcbind's implementation of GETADDR.

I never would have thought it would have been returned as an ASCII
string, but you live and learn, eh?

Now, on to fixing the bug.

Warner

P.S. http://people.freebsd.org/~imp/wireshark.dat has the trace I'm
referring to (and I've posted it in another message on this thread).

: > I did a packet capture from before I did the mount on maud-dib.  I can
: > see the NFS mount, the NFS traffic, all to 10.0.0.99.  I then see an
: > ARP for 10.0.0.5, followed by the NLM request from 10.0.0.8 to
: > 10.0.0.5.  This gets an ICMP port unreachable message, since I told
: > nfs, et al, to bind only to 10.0.0.99.
: > So, I thought, 'the answer is obvious, I'll just look for the packet
: > that has the string 'dune' in it (which is the hostname of 10.0.0.5).
: > No packets have that string in it, other than the mount packet which
: > has /dune in it.  Nor is there any DNS activity doing a lookup.  Nor
: > is there any static mapping in /etc/hosts on 10.0.0.8.
: > Next thought: Oh, somebody like portmapper or the NFS protocol from
: > 10.0.0.99 is telling 10.0.0.8's rpc.lockd (or something else) to do
: > locking requests to 10.0.0.5.  That's trivial to find, I think to
: > myself.  I'll look for the octets 0a 00 00 05 (hex).  The only
: > instances of that are in the ARP packet, the NLM request and the ICMP
: > unreachable packets.  No other packets includes these bytes.  Nor do
: > any include the reverse.
: > Right after the mount, there's nothing in the connection table that
: > points to 10.0.0.5, only 10.0.0.99.
: > So I'm having a serious WTF moment.  How the heck is this even
: > possible.  Any ideas on where to look for where this gets set and/or
: > communicated?
: > thanks a bunch for any insight that you can give...
: > Warner
: > _______________________________________________
: > freebsd-net@freebsd.org mailing list
: > http://lists.freebsd.org/mailman/listinfo/freebsd-net
: > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
: 
: 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100206.212042.925196285631243946.imp>