Date: Sat, 06 Feb 2010 19:11:53 -0700 (MST) From: "M. Warner Losh" <imp@bsdimp.com> To: net@FreeBSD.org Subject: How does rpc.lockd know where to send a request Message-ID: <20100206.191153.401093655925072575.imp@bsdimp.com>
next in thread | raw e-mail | index | archive | help
I have a problem. All systems are running freebsd-current form sometime in the last month, although similar systems running 8.0-RELEASE exhibit exactly the same problem. rpc.lockd on an NFS client is doing something that baffles my mind entirely, maybe you can help. Please bear with me, this is a little complicated, but I wanted to include all the details. I have a host, let's call it dune. dune is at 10.0.0.5. dune is also the master for the carp interface 10.0.0.99. It is running rpc.lockd and is an nfs server. I've told nfs, rpcbind, lockd and statd to only listen on address 10.0.0.99. I have a second host. maud-dib is 10.0.0.8. I do "mount 10.0.0.99:/dune /dune" on maud-dib. Wireshark shows all the traffic going to 10.0.0.99. All is happy in the world. When I start, there's no ARP entry for 10.0.0.5 on 10.0.0.8, nor is there after the mount. Until I do the following 'lockf /dune/imp/junk ls' (I have write perms to /dune/imp). At this point, rpc.lockd hangs. I get the message "10.0.0.99:/dune: lockd not responding" which seems odd. lockd is really there. However, wireshark shows the NLM traffic going to IP address 10.0.0.5. maud-dib has no carp interfaces. That's odd. So my question is 'how does lockd know where to go to talk the NLM protocol?' I did a packet capture from before I did the mount on maud-dib. I can see the NFS mount, the NFS traffic, all to 10.0.0.99. I then see an ARP for 10.0.0.5, followed by the NLM request from 10.0.0.8 to 10.0.0.5. This gets an ICMP port unreachable message, since I told nfs, et al, to bind only to 10.0.0.99. So, I thought, 'the answer is obvious, I'll just look for the packet that has the string 'dune' in it (which is the hostname of 10.0.0.5). No packets have that string in it, other than the mount packet which has /dune in it. Nor is there any DNS activity doing a lookup. Nor is there any static mapping in /etc/hosts on 10.0.0.8. Next thought: Oh, somebody like portmapper or the NFS protocol from 10.0.0.99 is telling 10.0.0.8's rpc.lockd (or something else) to do locking requests to 10.0.0.5. That's trivial to find, I think to myself. I'll look for the octets 0a 00 00 05 (hex). The only instances of that are in the ARP packet, the NLM request and the ICMP unreachable packets. No other packets includes these bytes. Nor do any include the reverse. Right after the mount, there's nothing in the connection table that points to 10.0.0.5, only 10.0.0.99. So I'm having a serious WTF moment. How the heck is this even possible. Any ideas on where to look for where this gets set and/or communicated? thanks a bunch for any insight that you can give... Warner
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100206.191153.401093655925072575.imp>