From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 27 19:35:10 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 49BC116A422; Fri, 27 Jan 2006 19:35:10 +0000 (GMT) (envelope-from lists@intricatesoftware.com) Received: from mta1.srv.hcvlny.cv.net (mta1.srv.hcvlny.cv.net [167.206.4.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5B1C943D80; Fri, 27 Jan 2006 19:34:56 +0000 (GMT) (envelope-from lists@intricatesoftware.com) Received: from [172.16.1.72] (ool-457a77e8.dyn.optonline.net [69.122.119.232]) by mta1.srv.hcvlny.cv.net (Sun Java System Messaging Server 6.2-4.03 (built Sep 22 2005)) with ESMTP id <0ITR00H21NQ7KX20@mta1.srv.hcvlny.cv.net>; Fri, 27 Jan 2006 14:34:55 -0500 (EST) Date: Fri, 27 Jan 2006 14:34:54 -0500 From: Kurt Miller In-reply-to: <200601271042.04315.lists@intricatesoftware.com> To: freebsd-hackers@freebsd.org Message-id: <200601271434.54776.lists@intricatesoftware.com> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT Content-disposition: inline References: <200601271042.04315.lists@intricatesoftware.com> User-Agent: KMail/1.9 Cc: Daniel Eischen Subject: Re: read hang on datagram socket X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: kurt@intricatesoftware.com List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jan 2006 19:35:10 -0000 On Friday 27 January 2006 10:42 am, Kurt Miller wrote: > On Friday 27 January 2006 9:16 am, Daniel Eischen wrote: > > On Thu, 26 Jan 2006, Kurt Miller wrote: > > > > > On Thursday 26 January 2006 7:26 pm, Daniel Eischen wrote: > > > > > > > > The modified version does not hang on 5.2. Do you have multiple > > > > interfaces on your 5.4 box? > > > > > > No, the 5.4 box is virtually identical to the 6.0 box. I set them both > > > up at the same time from initial installs for the project. > > > > > > truk@freebsd5-4$ ifconfig > > > lnc0: flags=108843 mtu 1500 > > > inet6 fe80::250:56ff:fe40:451a%lnc0 prefixlen 64 scopeid 0x1 > > > inet 172.16.1.36 netmask 0xffffff00 broadcast 172.16.1.255 > > > ether 00:50:56:40:45:1a > > > lo0: flags=8049 mtu 16384 > > > inet 127.0.0.1 netmask 0xff000000 > > > inet6 ::1 prefixlen 128 > > > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 > > > > [ ... ] > > > > > > What happens when you try using non-zero IP addresses and ports? > > > > > > > > > > Setting the ports doesn't effect the problem, however setting the > > > addresses does. It really seems like binding to INADDR_ANY only binds > > > to loopback address 127.0.0.1 and not all the interfaces. > > > > > > If sock1 is bound to the hostAddress and sock2 connects to sock1 at > > > the hostAddress it works ok. If sock1 is bound to INADDR_ANY and sock2 > > > connects to sock1 using INADDR_ANY it works. but any mixture of of > > > using INADDR_ANY with the hostAddress fails. > > > > According to Steven's Network Programming, when binding to > > INADDR_ANY, the operating system doesn't assign an address > > until the first write. This is unlike the port, where using > > port 0, an ephemeral port is assigned right away. I don't > > have the book handy right now, so I forgot if the INADDR_ANY > > behavior is only when you have multiple interfaces or not. > > The book I'm using is not that clear about it (Advanced > Programming in the UNIX Environment). It does say that using > connect with a datagram socket will receive datagrams only > from the address specified, which seems related to the problem. > > > > Unfortunately, I don't have control over the addresses, the java > > > programs do. This particular jck test binds the first socket with > > > INADDR_ANY (InetAddress.getByName("0.0.0.0")) and connects the second > > > socket to the first using the hostAddress (InetAddress.getLocalHost()). > > > > You can try sending a byte before getting the address for the > > port and see if that works. Do you have anything weird, like > > not having a default route (gateway)? > > Yes, sending a byte before doing the connect(sock2, &sock1Addres > does work. However, calling connect/send/read after that fails too. > The problem appears to be related to sock1's selection of it source > address, or perhaps the connect call is ignoring the hostAddress and > using the loopback address. The netstat output leads me to believe > it is the latter. It is behaving like a mismatch between source > address of the message and the address enforced by the connect call. > > I've confirmed that the sock1Addr struct is filled in correctly with > the hostAddress and port of sock1 and sock2Addr is filled in correctly > with the hostAddress and port of sock2. > > I've got a pretty standard setup. All three machines are using DHCP > to get their addresses, default route and name servers. I've set > the dhcp server to give them the same IP addresses each time. Here's > the routing table for each: > > truk@freebsd6-0$ netstat -r -f inet > Routing tables > > Internet: > Destination Gateway Flags Refs Use Netif Expire > default 172.16.1.1 UGS 0 34 lnc0 > localhost localhost UH 0 0 lo0 > 172.16.1/24 link#1 UC 0 0 lnc0 > 172.16.1.1 00:00:24:c2:47:b5 UHLW 2 0 lnc0 671 > 172.16.1.10 00:13:46:c9:0a:5c UHLW 1 0 lnc0 1103 > 172.16.1.72 00:12:f0:b5:f4:6c UHLW 1 118 lnc0 961 > > truk@freebsd5-4$ netstat -r -f inet > Routing tables > > Internet: > Destination Gateway Flags Refs Use Netif Expire > default 172.16.1.1 UGS 0 112 lnc0 > localhost localhost UH 1 19 lo0 > 172.16.1/24 link#1 UC 0 0 lnc0 > 172.16.1.1 00:00:24:c2:47:b5 UHLW 1 0 lnc0 40 > 172.16.1.10 00:13:46:c9:0a:5c UHLW 0 0 lnc0 1106 > 172.16.1.36 localhost UGHS 0 7 lo0 > 172.16.1.72 00:12:f0:b5:f4:6c UHLW 0 3151 lnc0 749 > > $ netstat -r -f inet #freebsd4-11 > Routing tables > > Internet: > Destination Gateway Flags Refs Use Netif Expire > default 172.16.1.1 UGSc 1 0 lnc0 > localhost localhost UH 1 0 lo0 > 172.16.1/24 link#1 UC 2 0 lnc0 > 172.16.1.1 00:00:24:c2:47:b5 UHLW 2 0 lnc0 1200 > 172.16.1.30 localhost UGHS 0 0 lo0 > 172.16.1.72 00:12:f0:b5:f4:6c UHLW 1 112 lnc0 1195 > > Thanks for the ideas and suggestions. The problem turned out to be related to how dhcp sets up the routing table. Switching to a fixed address setup adjusted the routing table and now the the program works. Go figure. Here's the routing table when using a fixed address: Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default 172.16.1.1 UGS 0 48 lnc0 localhost localhost UH 0 0 lo0 172.16.1/24 link#1 UC 0 0 lnc0 172.16.1.1 00:00:24:c2:47:b5 UHLW 1 0 lnc0 1200 172.16.1.20 00:07:e9:47:0f:f9 UHLW 0 2 lnc0 429 172.16.1.21 00:40:96:39:b6:f9 UHLW 0 3 lnc0 1197 172.16.1.36 00:50:56:40:45:1a UHLW 0 1 lo0 172.16.1.72 00:12:f0:b5:f4:6c UHLW 0 637 lnc0 1187 Notice the difference in the gateway for 172.16.1.36. Thanks for all the help and suggestions. -Kurt