From owner-freebsd-bugs Wed Jun 6 5:40: 9 2001 Delivered-To: freebsd-bugs@hub.freebsd.org Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 0830E37B401 for ; Wed, 6 Jun 2001 05:40:02 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.11.3/8.11.3) id f56Ce1J14186; Wed, 6 Jun 2001 05:40:01 -0700 (PDT) (envelope-from gnats) Date: Wed, 6 Jun 2001 05:40:01 -0700 (PDT) Message-Id: <200106061240.f56Ce1J14186@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org Cc: From: Ruslan Ermilov Subject: Re: kern/27890: FreeBSD not always seems to take the best route Reply-To: Ruslan Ermilov Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org The following reply was made to PR kern/27890; it has been noted by GNATS. From: Ruslan Ermilov To: Andre Albsmeier Cc: bug-followup@FreeBSD.org Subject: Re: kern/27890: FreeBSD not always seems to take the best route Date: Wed, 6 Jun 2001 15:32:05 +0300 On Wed, Jun 06, 2001 at 12:29:04PM +0200, Andre Albsmeier wrote: > Thanks for helping... > > On Wed, 06-Jun-2001 at 11:24:19 +0300, Ruslan Ermilov wrote: > > > > ... > > > > I can't reproduce this problem on my 4.3-STABLE box. > > > > Yes, the UDP socket has the reference to the protocol-cloned > > route to the destination host S through the router 1 initially, > > and UDP packets go through that router. > > > > In my tests, router 1 (192.168.1.1) was the host *not* configured > > to act as the router, so all "foreign" packets sent to it got > > OK, I have blocked packets coming from C on router 1. So > I think I got the same config as you. > > > > silently ignored. I used the ports/net/netcat utility to connect > > to the UDP `echo' port of the destination S (192.168.2.1): > > > > Fig.1: Initial state, before UDP socket is open. > > > > : # netstat -arn > > : Destination Gateway Flags Refs Use Netif Expire > > : default 192.168.1.1 UGSc 0 2 rl0 > > : 127.0.0.1 127.0.0.1 UH 1 6 lo0 > > : 192.168.1 link#1 UC 3 0 rl0 => > > > > > > Fig.2: We connect(2) UDP socket to the "echo" port on S (192.168.2.1). > > > > : # nc -u 192.168.2.1 echo > > : ping1 > > : ping2 > > : ping3 > > [...] > > > > As you can see, we receive no echos back. > > OK, same here. > > > > Fig.3: Routing table after UDP socket is open. > > > > : # netstat -arn > > : Destination Gateway Flags Refs Use Netif Expire > > : default 192.168.1.1 UGSc 1 2 rl0 > > : 127.0.0.1 127.0.0.1 UH 1 6 lo0 > > : 192.168.1 link#1 UC 4 0 rl0 => > > : 192.168.2.1 192.168.1.1 UGHW 1 14 rl0 > > > > The route to S (192.168.2.1) was cloned (W) from the `default' route. > > refcnt=1 on the 192.168.2.1 route indicates that the UDP socket holds > > a reference to this route. > > Same here: > > 192.168.2.1 192.168.1.1 UGHW 1 425 fxp0 > > > > Fig.4: I manually add the route to the 192.168.2 network. > > > > : # route add -net 192.168.2 192.168.1.2 > > : add net 192.168.2: gateway 192.168.1.2 > > OK, I don;t add it manually but wait until routed messages from > 192.168.1.2 brings it back. > > > > > > Fig.5: Routing table after the route to the 192.168.2 network was added. > > > > : # netstat -arn > > : Destination Gateway Flags Refs Use Netif Expire > > : default 192.168.1.1 UGSc 1 2 rl0 > > : 127.0.0.1 127.0.0.1 UH 1 6 lo0 > > : 192.168.1 link#1 UC 4 0 rl0 => > > : 192.168.2 192.168.1.2 UGSc 0 0 rl0 > > Yup, same here > > > > As you can see, the route to the 192.168.2.1 host is deleted from the routing > > table. It actually doesn't get freed completely, as it had non-zero reference > > count (UDP socket still holds on it), but instead it gets marked as DOWN, and > > will be freed and reallocated in ip_output() on the next use. > > > > Fig.6: We continue to send UDP datagrams. > > > > : # nc -u 192.168.2.1 echo (continued) > > : ping4 > > : ping4 > > : ping5 > > : ping5 > > : ping6 > > : ping6 > > > > As you can see, this time we get the echos back. > > Yes, same here :-( > > > > Fig.7: Routing table after we sent more UDP datagrams. > > > > : # netstat -arn -finet > > : Destination Gateway Flags Refs Use Netif Expire > > : default 192.168.1.1 UGSc 0 2 rl0 > > : 127.0.0.1 127.0.0.1 UH 1 6 lo0 > > : 192.168.1 link#1 UC 4 0 rl0 => > > : 192.168.2 192.168.1.2 UGSc 1 3 rl0 > > > > The refcount on 192.168.2 route has grown to 1, indicating that the > > UDP socket now holds on this route. The `Use' count of 3 corresponds > > to our three UDP datagrams (ping4, ping5, and ping6). > > > > Could you please repeat these steps in your environment, and try to > > detect where it behaved differently in your case. > > It doesn't behave differently, that's interesting. May I ask you to > try it using syslogd? > > - Let host C log to host S (with the route installed). > - Watch C's messages appear on S. > - Delete C's route to S (via router 2) > - Let host C log again (run tcpdump on router 1 to see the packets come in) > - Install the route to S (via router 2) again on C > - Log more stuff. If you don't see the packets go into router 1 anymore > I am really lost... > Yes, I have reproduced the problem here. My test misses one step. OK, now about what happens here. Initially, there is the route (cloned from the network route) to S (192.168.2.1) through the router 2 (192.168.1.2). UDP socket uses this route initially. When this (and the 192.168.2 network) routes disappear, on the next write (!), ip_output() detects that the S route is DOWN, and "allocates" (caches) another route, which happens to be the "default" route pointing to router 1 (192.168.1.1). Later, when the route to the 192.168.2 network gets installed again, it's not taken into account, as the cached ("default") route is still UP. Unfortunately, there is no easy way to fix this. Checking for the best-match route on every write may be too time consuming. As the workaround, you can delete and re-add your "default" route. This worked for me here. `route delete default' will delete the "default" route from the routing table, but because it has a refcnt>0 will not delete it immediately, but will mark it as DOWN. ip_output() for this UDP socket's write will detect that the cached route is DOWN, will free it, and allocate a new route, which will be the route to the 192.168.2 network through router 2 (192.168.1.2) this time. The actual fix would be to notify protocol (from within the routing code) whenever its routing table is modified. This notification could then be saved in a variable as timestamp, and every PCB-cached route could have a similar timestamp as well, indicating when this "caching" took place. Having that, ip_output() would "invalidate" cached route if it was cached before the last routing table modification was done. I could probably try to implement this, if no one else can come up with a better idea. Cheers, -- Ruslan Ermilov Oracle Developer/DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message