From owner-svn-src-all@FreeBSD.ORG Tue Apr 21 10:45:08 2009 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4173B106566C; Tue, 21 Apr 2009 10:45:08 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from mail.cksoft.de (mail.cksoft.de [195.88.108.3]) by mx1.freebsd.org (Postfix) with ESMTP id 6E7228FC18; Tue, 21 Apr 2009 10:45:07 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from localhost (amavis.fra.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id 0C38641C69F; Tue, 21 Apr 2009 12:45:06 +0200 (CEST) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([195.88.108.3]) by localhost (amavis.fra.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id MWJ6zycVh+EF; Tue, 21 Apr 2009 12:45:05 +0200 (CEST) Received: by mail.cksoft.de (Postfix, from userid 66) id 5E76D41C6A7; Tue, 21 Apr 2009 12:45:05 +0200 (CEST) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id B55A74448E6; Tue, 21 Apr 2009 10:40:25 +0000 (UTC) Date: Tue, 21 Apr 2009 10:40:25 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: Qing Li In-Reply-To: <20090421044721.7573C8FC13@mx1.freebsd.org> Message-ID: <20090421080014.N15361@maildrop.int.zabbadoz.net> References: <20090421044721.7573C8FC13@mx1.freebsd.org> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: RE: svn commit: r191305 - head/usr.sbin/ppp X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Apr 2009 10:45:08 -0000 On Mon, 20 Apr 2009, Qing Li wrote: Hi, > Are you really sure backing this change out is the right thing to do ?? yes, I am. 1) if you do not set RTA_GATEWAY but still copy the adress, your kernel FIB possibly ends up with an incorrect address for the next field, like RTA_NETMASK, RTA_IFP, ..) for example. See sys/net/rtsock.c:rt_xaddrs() 2) I had this sample around for a few days now as I have been lerning more rtsock.c stuff with the help of Kip and finding bugs (ignore those for a moment): dopt# ifconfig tun2 create inet 192.0.2.100 192.0.2.101 up dopt# ifconfig tun2 tun2: flags=8051 metric 0 mtu 1500 inet6 fe80::2e0:81ff:fe31:db8c%tun2 prefixlen 64 scopeid 0x8 inet 192.0.2.100 --> 192.0.2.101 netmask 0xffffff00 dopt# netstat -rn | grep tun2 192.0.2.101 link#8 UH 0 0 tun2 fe80::%tun2/64 link#8 U tun2 ff01:8::/32 fe80::2e0:81ff:fe31:db8c%tun2 U tun2 ff02::%tun2/32 fe80::2e0:81ff:fe31:db8c%tun2 U tun2 So querying the FIB for the IPv4 (ignoring the v6 for the moment for simplicity) route (with something handcrafted) I get back: rtm_msglen 252 rtm_version 5 rtm_type RTM_GET rtm_index 8 rtm_flags =5 rtm_addrs =b3 rtm_pid 0 rtm_seq 0 rtm_errno 0 rtm_fmask =0<> rtm_inits =0<> rt_metrics rtm_rmx: rmx_locks 0 rmx_mtu 1500 rmx_hopcount 0 rmx_expire 0 rmx_recvpipe 0 rmx_sendpipe 0 rmx_ssthresh 0 rmx_rtt 0 rmx_rttvar 0 rmx_pksent 0 rmx_filler { 0, 0, 0, 0 } addresses: 16 (inet) 192.0.2.101 =1 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv 54 (link) =2 0000 36 12 08 00 17 00 00 00 00 00 00 00 00 00 00 00 |6...............| 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0030 00 00 00 00 00 00 |...... | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 56 (link) tun2 =10 0000 38 12 08 00 17 04 00 00 74 75 6e 32 00 00 00 00 |8.......tun2....| 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0030 00 00 00 00 00 00 00 00 |........ | 16 (inet) 192.0.2.100 =20 (RTA_BRD) =80 As you can see, we do get an RTA_GATEWAY from the kernel with AF_LINK. Decoding the sockaddr_dl: .sdl_len = 0x36 (54) .sdl_family = 0x12 (18, AF_LINK) .sdl_index = 8 .sdl_type = 0x17 (23, IFT_PPP) .sdl_nlen = 0 .sdl_alen = 0 .sdl_slen = 0 .sdl_data = { 0, .. } So we are clearly getting back an AF_LINK from the kernel and ought to be able to just stick it back in there as is. 3) To show you that it still works as expected: : netstat -Wanrf inet | egrep '(^Dest|tun2)' Destination Gateway Flags Refs Use Mtu Netif Expire 192.0.2.101 link#8 UH 0 0 1500 tun2 : route change 192.0.2.101 -mtu 1300 change host 192.0.2.101 : netstat -Wanrf inet | egrep '(^Dest|tun2)' Destination Gateway Flags Refs Use Mtu Netif Expire 192.0.2.101 link#8 UH 0 0 1300 tun2 and: : route change 192.0.2.101 -link -mtu 1400 change host 192.0.2.101 : netstat -Wanrf inet | egrep '(^Dest|tun2)' Destination Gateway Flags Refs Use Mtu Netif Expire 192.0.2.101 link#8 UH 0 0 1400 tun2 both work (note, even with -link I _think_ route does not pass down the sockaddr_dl) so that's a bit confusing or rather a bug in route(8) as we'll see later. But : route change 192.0.2.101 192.0.2.100 -mtu 1500 change host 192.0.2.101: gateway 192.0.2.100 : netstat -Wanrf inet | egrep '(^Dest|tun2)' Destination Gateway Flags Refs Use Mtu Netif Expire 192.0.2.101 192.0.2.100 UGH 0 0 1500 tun2 that changed the gateway address, which is correct, but also set RTF_GATEWAY, which is not as 192.0.2.101 is directly connected on the PtP interface. The same would happen if passing down the sockaddr_dl, just that the gateway address would stay link#8. 4) RTA_GATEWAY != RTF_GATEWAY The RTF_GATEWAY cannot be set from userspace for an RTM_CHANGE. Even if given it would be filtered out because of RTF_FMASK. So what your commit message in r186308 was saying and trying to work around the RTF_GATEWAY problem by not setting RTA_GATEWAY this only broke rt_xaddrs() parsing (see 1)) . The fact that after an RTM_CHANGE with an RTA_GATEWAY set, RTF_GATEWAY appears, comes from the change in r167797 (ignore the ordering change, only the rt->rt_flags |= RTF_GATEWAY; there is relevant). Which is under discussion as my follow-up commit said: -------------------- r186308, backed out in r191305, already tried to do that, and in addition ignore AF_LINK types of gateway addresses to work around a problem that r167797 had introduced on the kernel side always setting RTF_GATEWAY if a gateway address was passed into the kernel. The proper solution for this is still under discussion so I am hesitant to re-add the special AF_LINK treatment for now. -------------------- One possible solution I had been theoretically pondering first would have been: Index: sys/net/rtsock.c =================================================================== --- sys/net/rtsock.c (revision 190341) +++ sys/net/rtsock.c (working copy) @@ -673,25 +673,27 @@ route_output(struct mbuf *m, struct sock if (info.rti_info[RTAX_GATEWAY] != NULL) { RT_UNLOCK(rt); RADIX_NODE_HEAD_LOCK(rnh); RT_LOCK(rt); error = rt_setgate(rt, rt_key(rt), info.rti_info[RTAX_GATEWAY]); RADIX_NODE_HEAD_UNLOCK(rnh); if (error != 0) { RT_UNLOCK(rt); senderr(error); } - rt->rt_flags |= RTF_GATEWAY; + if (info.rti_info[RTAX_GATEWAY]->sa_family != + AF_LINK) + rt->rt_flags |= RTF_GATEWAY; } if (info.rti_ifa != NULL && info.rti_ifa != rt->rt_ifa) { IFAREF(info.rti_ifa); rt->rt_ifa = info.rti_ifa; rt->rt_ifp = info.rti_ifp; } /* Allow some flags to be toggled on change. */ if (rtm->rtm_fmask & RTF_FMASK) rt->rt_flags = (rt->rt_flags & ~rtm->rtm_fmask) | (rtm->rtm_flags & rtm->rtm_fmask); But I am almost pretty sure that would not be right in all cases either. Our routing sockets are currently a bit .. uhm .. fragile so a small change to fix one thing can break other stuff easily. If we do not want any AF_LINK addresses to be set in RTM_CHANGE as RTA_GATEWAY anymore, we should first make sure that we cannot add them in first place and thus the kernel would not report those back. But I am pretty sure this will simply break all interfaces. Also it would not easily be possible to fix the example from 3) anymore. BTW neither: : route change 192.0.2.101 -mtu 1500 change host 192.0.2.101 : netstat -Wanrf inet | egrep '(^Dest|tun2)' Destination Gateway Flags Refs Use Mtu Netif Expire 192.0.2.101 192.0.2.100 UGH 0 0 1500 tun2 nor: : route change 192.0.2.101 -link -mtu 1500 change host 192.0.2.101 : netstat -Wanrf inet | egrep '(^Dest|tun2)' Destination Gateway Flags Refs Use Mtu Netif Expire 192.0.2.101 192.0.2.100 UGH 0 0 1500 tun2 does make it a link#%d again. Say what you want, this is a route(8) bug as a handcrafted update does: : netstat -Wanrf inet | egrep '(^Dest|tun2)' Destination Gateway Flags Refs Use Mtu Netif Expire 192.0.2.101 192.0.2.100 UGH 0 0 1500 tun2 : sudo ./a.out : netstat -Wanrf inet | egrep '(^Dest|tun2)' Destination Gateway Flags Refs Use Mtu Netif Expire 192.0.2.101 link#8 UGH 0 0 1500 tun2 what it is supposed to do. The only thing left is the RTM_GATEWAY that the kernel set, unfortunately, which is NOT the fault of ppp. So the question is, can we somehow clear it? The answer to that is "no". >> -----Original Message----- >> From: owner-src-committers@FreeBSD.org >> [mailto:owner-src-committers@FreeBSD.org] On Behalf Of Bjoern A. Zeeb >> Sent: Monday, April 20, 2009 4:23 AM >> To: src-committers@freebsd.org; svn-src-all@freebsd.org; >> svn-src-head@freebsd.org >> Subject: svn commit: r191305 - head/usr.sbin/ppp >> >> Author: bz >> Date: Mon Apr 20 11:22:51 2009 >> New Revision: 191305 >> URL: http://svn.freebsd.org/changeset/base/191305 >> >> Log: >> Back out r186308: >> >> in case of AF_LINK, which the kernel still returns for an >> RTAX_GATEWAY >> as an empty sockaddr_dl in the classic tun case. >> Copying the address into the message payload, but not the >> RTA_GATEWAY >> flag results in rt_xaddrs() in the kernel tripping over >> that and parsing >> the next attribute set with a flag, i.e. RTA_NETMASK, with >> the gateway >> address, resulting in bogus route entry. >> >> MFC after: 3 days >> >> Modified: >> head/usr.sbin/ppp/route.c >> >> Modified: head/usr.sbin/ppp/route.c >> ============================================================== >> ================ >> --- head/usr.sbin/ppp/route.c Mon Apr 20 10:40:42 2009 >> (r191304) >> +++ head/usr.sbin/ppp/route.c Mon Apr 20 11:22:51 2009 >> (r191305) >> @@ -910,10 +910,8 @@ rt_Update(struct bundle *bundle, const s >> p += memcpy_roundup(p, dst, dst->sa_len); >> } >> >> - if (gw != NULL && (gw->sa_family != AF_LINK)) >> - rtmes.m_rtm.rtm_addrs |= RTA_GATEWAY; >> + rtmes.m_rtm.rtm_addrs |= RTA_GATEWAY; >> p += memcpy_roundup(p, gw, gw->sa_len); >> - >> if (mask) { >> rtmes.m_rtm.rtm_addrs |= RTA_NETMASK; >> p += memcpy_roundup(p, mask, mask->sa_len); >> >> > > -- Bjoern A. Zeeb The greatest risk is not taking one.