Date: Sat, 06 Feb 2010 21:20:42 -0700 (MST) From: "M. Warner Losh" <imp@bsdimp.com> To: julian@elischer.org Cc: net@freebsd.org Subject: Re: How does rpc.lockd know where to send a request Message-ID: <20100206.212042.925196285631243946.imp@bsdimp.com> In-Reply-To: <4B6E2B40.1070405@elischer.org> References: <20100206.191153.401093655925072575.imp@bsdimp.com> <4B6E2B40.1070405@elischer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
In message: <4B6E2B40.1070405@elischer.org> Julian Elischer <julian@elischer.org> writes: : M. Warner Losh wrote: : > I have a problem. All systems are running freebsd-current form : > sometime in the last month, although similar systems running : > 8.0-RELEASE exhibit exactly the same problem. rpc.lockd on an NFS : > client is doing something that baffles my mind entirely, maybe you can : > help. Please bear with me, this is a little complicated, but I wanted : > to include all the details. : > I have a host, let's call it dune. dune is at 10.0.0.5. dune is also : > the master for the carp interface 10.0.0.99. It is running rpc.lockd : > and is an nfs server. I've told nfs, rpcbind, lockd and statd to only : > listen on address 10.0.0.99. : > I have a second host. maud-dib is 10.0.0.8. I do "mount : > 10.0.0.99:/dune /dune" on maud-dib. Wireshark shows all the traffic : > going to 10.0.0.99. All is happy in the world. When I start, there's : > no ARP entry for 10.0.0.5 on 10.0.0.8, nor is there after the mount. : > Until I do the following 'lockf /dune/imp/junk ls' (I have write perms : > to /dune/imp). At this point, rpc.lockd hangs. I get the message : > "10.0.0.99:/dune: lockd not responding" which seems odd. lockd is : > really there. However, wireshark shows the NLM traffic going to IP : > address 10.0.0.5. maud-dib has no carp interfaces. : > That's odd. So my question is 'how does lockd know where to go to : > talk the NLM protocol?' : > : : my recollection is that maud-dib will sent an initial packet to dune : and dune will respond but that the response may come from 10.0.0.5, : after which maud-dib will redirect all requests there, which will not : work because dune is not listenning there. : : teh problem is that dune's daemon is setting a local address of : IPADDR_ANY (0.0.0.0) which tells the packets to use a from : address that is the address ofthe interface that they exit from. : : Since 10.0.0.5 is the primary address on that interface, that gets : selected. : you may try some trickery where you add the .5 address AFTER the .99 : address so that the .99 is the primary address. Actually, it looks like this is getting returned, as a ASCII string '10.0.0.5' in frame 68 in response to the GETADDR call. Since I've told it specifically '-h 10.0.0.99' I'd have thought it would respect that. Since it is supposed to be bound to 10.0.0.99, I'd proffer the argument this is a bug in rpcbind's implementation of GETADDR. I never would have thought it would have been returned as an ASCII string, but you live and learn, eh? Now, on to fixing the bug. Warner P.S. http://people.freebsd.org/~imp/wireshark.dat has the trace I'm referring to (and I've posted it in another message on this thread). : > I did a packet capture from before I did the mount on maud-dib. I can : > see the NFS mount, the NFS traffic, all to 10.0.0.99. I then see an : > ARP for 10.0.0.5, followed by the NLM request from 10.0.0.8 to : > 10.0.0.5. This gets an ICMP port unreachable message, since I told : > nfs, et al, to bind only to 10.0.0.99. : > So, I thought, 'the answer is obvious, I'll just look for the packet : > that has the string 'dune' in it (which is the hostname of 10.0.0.5). : > No packets have that string in it, other than the mount packet which : > has /dune in it. Nor is there any DNS activity doing a lookup. Nor : > is there any static mapping in /etc/hosts on 10.0.0.8. : > Next thought: Oh, somebody like portmapper or the NFS protocol from : > 10.0.0.99 is telling 10.0.0.8's rpc.lockd (or something else) to do : > locking requests to 10.0.0.5. That's trivial to find, I think to : > myself. I'll look for the octets 0a 00 00 05 (hex). The only : > instances of that are in the ARP packet, the NLM request and the ICMP : > unreachable packets. No other packets includes these bytes. Nor do : > any include the reverse. : > Right after the mount, there's nothing in the connection table that : > points to 10.0.0.5, only 10.0.0.99. : > So I'm having a serious WTF moment. How the heck is this even : > possible. Any ideas on where to look for where this gets set and/or : > communicated? : > thanks a bunch for any insight that you can give... : > Warner : > _______________________________________________ : > freebsd-net@freebsd.org mailing list : > http://lists.freebsd.org/mailman/listinfo/freebsd-net : > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" : :
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100206.212042.925196285631243946.imp>