Date: Sat, 06 Feb 2010 21:31:45 -0800 From: Julian Elischer <julian@elischer.org> To: "M. Warner Losh" <imp@bsdimp.com> Cc: net@freebsd.org Subject: Re: How does rpc.lockd know where to send a request Message-ID: <4B6E5041.4050200@elischer.org> In-Reply-To: <20100206.210455.756168950357586371.imp@bsdimp.com> References: <20100206.191153.401093655925072575.imp@bsdimp.com> <4B6E2B40.1070405@elischer.org> <20100206.210455.756168950357586371.imp@bsdimp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
M. Warner Losh wrote: > In message: <4B6E2B40.1070405@elischer.org> > Julian Elischer <julian@elischer.org> writes: > : M. Warner Losh wrote: > : > I have a problem. All systems are running freebsd-current form > : > sometime in the last month, although similar systems running > : > 8.0-RELEASE exhibit exactly the same problem. rpc.lockd on an NFS > : > client is doing something that baffles my mind entirely, maybe you can > : > help. Please bear with me, this is a little complicated, but I wanted > : > to include all the details. > : > I have a host, let's call it dune. dune is at 10.0.0.5. dune is also > : > the master for the carp interface 10.0.0.99. It is running rpc.lockd > : > and is an nfs server. I've told nfs, rpcbind, lockd and statd to only > : > listen on address 10.0.0.99. > : > I have a second host. maud-dib is 10.0.0.8. I do "mount > : > 10.0.0.99:/dune /dune" on maud-dib. Wireshark shows all the traffic > : > going to 10.0.0.99. All is happy in the world. When I start, there's > : > no ARP entry for 10.0.0.5 on 10.0.0.8, nor is there after the mount. > : > Until I do the following 'lockf /dune/imp/junk ls' (I have write perms > : > to /dune/imp). At this point, rpc.lockd hangs. I get the message > : > "10.0.0.99:/dune: lockd not responding" which seems odd. lockd is > : > really there. However, wireshark shows the NLM traffic going to IP > : > address 10.0.0.5. maud-dib has no carp interfaces. > : > That's odd. So my question is 'how does lockd know where to go to > : > talk the NLM protocol?' > : > > : > : my recollection is that maud-dib will sent an initial packet to dune > : and dune will respond but that the response may come from 10.0.0.5, > : after which maud-dib will redirect all requests there, which will not > : work because dune is not listenning there. > > But wouldn't the response from 10.0.0.5 mean I could search for the > hex string and see 0a000005 in the packet header? probably, but it may also besoemwhere in the protocol, and not in the header.. It's quite common (I saw it at cisco) for protocols to have fields within them that say "who is responding" where a common address is used instead of where thepacket comes from, but that that address is selected by some mechanism that detirmines the address that would be used to send a packet to teh recipient. the usual mechanism to do this is to open a udp socket, bind to the destination and then do a getsocnname() which will return the primary address of the interface that would be used to send the packet, then that address is used in some "who I am" field within the protocol. anyhow, make sure the primary address on the interface is 99 and hte secondary is 5 before launching your next experiment. if it doesn't change things then that's one theory down the drain. > > : teh problem is that dune's daemon is setting a local address of > : IPADDR_ANY (0.0.0.0) which tells the packets to use a from > : address that is the address ofthe interface that they exit from. > > No, dune's daemon is sitting on 10.0.0.99. > > : Since 10.0.0.5 is the primary address on that interface, that gets > : selected. > : you may try some trickery where you add the .5 address AFTER the .99 > : address so that the .99 is the primary address. > > Normally, I'd believe you. But since there's nothing listening on the > * address, and also nothing listening on the 10.0.0.5 address, I'm > less sure. After looking at the wireshark dump, I don't see any > 10.0.0.5 packets until the ARP for it near the end of the trace. > > http://people.freebsd.org/~imp/wireshark.dat if you are interested. > > This is a good theory, and I'll have to look into it deeper. > > Warner > > > : > I did a packet capture from before I did the mount on maud-dib. I can > : > see the NFS mount, the NFS traffic, all to 10.0.0.99. I then see an > : > ARP for 10.0.0.5, followed by the NLM request from 10.0.0.8 to > : > 10.0.0.5. This gets an ICMP port unreachable message, since I told > : > nfs, et al, to bind only to 10.0.0.99. > : > So, I thought, 'the answer is obvious, I'll just look for the packet > : > that has the string 'dune' in it (which is the hostname of 10.0.0.5). > : > No packets have that string in it, other than the mount packet which > : > has /dune in it. Nor is there any DNS activity doing a lookup. Nor > : > is there any static mapping in /etc/hosts on 10.0.0.8. > : > Next thought: Oh, somebody like portmapper or the NFS protocol from > : > 10.0.0.99 is telling 10.0.0.8's rpc.lockd (or something else) to do > : > locking requests to 10.0.0.5. That's trivial to find, I think to > : > myself. I'll look for the octets 0a 00 00 05 (hex). The only > : > instances of that are in the ARP packet, the NLM request and the ICMP > : > unreachable packets. No other packets includes these bytes. Nor do > : > any include the reverse. > : > Right after the mount, there's nothing in the connection table that > : > points to 10.0.0.5, only 10.0.0.99. > : > So I'm having a serious WTF moment. How the heck is this even > : > possible. Any ideas on where to look for where this gets set and/or > : > communicated? > : > thanks a bunch for any insight that you can give... > : > Warner > : > _______________________________________________ > : > freebsd-net@freebsd.org mailing list > : > http://lists.freebsd.org/mailman/listinfo/freebsd-net > : > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > : > :
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B6E5041.4050200>