From owner-freebsd-net@FreeBSD.ORG Thu Mar 7 06:34:37 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 60890175; Thu, 7 Mar 2013 06:34:37 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2]) by mx1.freebsd.org (Postfix) with ESMTP id 0522C1E9; Thu, 7 Mar 2013 06:34:37 +0000 (UTC) Received: from v6.mpls.in ([2a02:978:2::5] helo=ws.su29.net) by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1UDUSq-0008ZT-Pk; Thu, 07 Mar 2013 10:38:04 +0400 Message-ID: <513834E4.7050203@FreeBSD.org> Date: Thu, 07 Mar 2013 10:34:12 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120121 Thunderbird/9.0 MIME-Version: 1.0 To: net@freebsd.org Subject: [patch] interface routes Content-Type: multipart/mixed; boundary="------------070403050505050004040202" Cc: Andre Oppermann X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 06:34:37 -0000 This is a multi-part message in MIME format. --------------070403050505050004040202 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hello list! There is a known long-lived issue with interface routes addition/deletion: ifconfig iface inet 1.2.3.4/24 can fail if given prefix is already in kernel route table (for example, advertised by IGP like OSPF). Interface route can be deleted via route(8) or any route socket user (sometimes this happens with popular opensource daemons like bird/quagga). Problem is reported at least in kern/106722 and kern/155772. This can be fixed the following way: Immutable route flag (RTM_PINNED, added in 19995 with 'for future use' comment) is utilised to mark route 'immutable'. rtrequest1_fib refuses to delete routes with given flag unless RTM_PINNED is set in rti_flags. Every interface address manupulation is done via rtinit[1], so rtinit1() sets this flag (and behavior does not change here). Adding interface address is handled via atomically deleting old prefix and adding interface one. --------------070403050505050004040202 Content-Type: text/plain; name="iface_routes.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="iface_routes.diff" Index: sys/net/if.c =================================================================== --- sys/net/if.c (revision 247623) +++ sys/net/if.c (working copy) @@ -1357,7 +1357,8 @@ if_rtdel(struct radix_node *rn, void *arg) return (0); err = rtrequest_fib(RTM_DELETE, rt_key(rt), rt->rt_gateway, - rt_mask(rt), rt->rt_flags|RTF_RNH_LOCKED, + rt_mask(rt), + rt->rt_flags|RTF_RNH_LOCKED|RTF_PINNED, (struct rtentry **) NULL, rt->rt_fibnum); if (err) { log(LOG_WARNING, "if_rtdel: error %d\n", err); Index: sys/net/route.c =================================================================== --- sys/net/route.c (revision 247842) +++ sys/net/route.c (working copy) @@ -1112,6 +1112,16 @@ rtrequest1_fib(int req, struct rt_addrinfo *info, error = 0; } #endif + if ((flags & RTF_PINNED) == 0) { + /* + * Check if can delete target route. + */ + rt = (struct rtentry *)rnh->rnh_lookup(dst, + netmask, rnh); + if ((rt != NULL) && (rt->rt_flags & RTF_PINNED)) + senderr(EPERM); + } + /* * Remove the item from the tree and return it. * Complain if it is not there and do no more processing. @@ -1430,6 +1440,7 @@ rtinit1(struct ifaddr *ifa, int cmd, int flags, in int didwork = 0; int a_failure = 0; static struct sockaddr_dl null_sdl = {sizeof(null_sdl), AF_LINK}; + struct radix_node_head *rnh; if (flags & RTF_HOST) { dst = ifa->ifa_dstaddr; @@ -1488,7 +1499,6 @@ rtinit1(struct ifaddr *ifa, int cmd, int flags, in */ for ( fibnum = startfib; fibnum <= endfib; fibnum++) { if (cmd == RTM_DELETE) { - struct radix_node_head *rnh; struct radix_node *rn; /* * Look up an rtentry that is in the routing tree and @@ -1538,7 +1548,8 @@ rtinit1(struct ifaddr *ifa, int cmd, int flags, in */ bzero((caddr_t)&info, sizeof(info)); info.rti_ifa = ifa; - info.rti_flags = flags | (ifa->ifa_flags & ~IFA_RTSELF); + info.rti_flags = flags | + (ifa->ifa_flags & ~IFA_RTSELF) | RTF_PINNED; info.rti_info[RTAX_DST] = dst; /* * doing this for compatibility reasons @@ -1550,6 +1561,32 @@ rtinit1(struct ifaddr *ifa, int cmd, int flags, in info.rti_info[RTAX_GATEWAY] = ifa->ifa_addr; info.rti_info[RTAX_NETMASK] = netmask; error = rtrequest1_fib(cmd, &info, &rt, fibnum); + + if ((error == EEXIST) && (cmd == RTM_ADD)) { + /* + * Interface route addition failed. + * Note we probably already checked + * other interface addresses if given prefix exists. + * Atomically delete current prefix generating + * RTM_DELETE message, and retry adding + * interface address. + */ + rnh = rt_tables_get_rnh(fibnum, dst->sa_family); + RADIX_NODE_HEAD_LOCK(rnh); + /* Delete old prefix */ + info.rti_ifa = NULL; + info.rti_flags = RTF_RNH_LOCKED; + error = rtrequest1_fib(RTM_DELETE, &info, &rt, fibnum); + if (error == 0) { + info.rti_ifa = ifa; + info.rti_flags = flags | RTF_RNH_LOCKED | + (ifa->ifa_flags & ~IFA_RTSELF) | RTF_PINNED; + error = rtrequest1_fib(cmd, &info, &rt, fibnum); + } + RADIX_NODE_HEAD_UNLOCK(rnh); + } + + if (error == 0 && rt != NULL) { /* * notify any listening routing agents of the change Index: sys/net/route.h =================================================================== --- sys/net/route.h (revision 247623) +++ sys/net/route.h (working copy) @@ -176,7 +176,7 @@ struct ortentry { /* 0x20000 unused, was RTF_WASCLONED */ #define RTF_PROTO3 0x40000 /* protocol specific routing flag */ /* 0x80000 unused */ -#define RTF_PINNED 0x100000 /* future use */ +#define RTF_PINNED 0x100000 /* route is immutable */ #define RTF_LOCAL 0x200000 /* route represents a local address */ #define RTF_BROADCAST 0x400000 /* route represents a bcast address */ #define RTF_MULTICAST 0x800000 /* route represents a mcast address */ --------------070403050505050004040202--