From owner-freebsd-net@FreeBSD.ORG Wed Apr 30 15:38:26 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D43B010656C3 for ; Wed, 30 Apr 2008 15:38:26 +0000 (UTC) (envelope-from bms@incunabulum.net) Received: from out4.smtp.messagingengine.com (out4.smtp.messagingengine.com [66.111.4.28]) by mx1.freebsd.org (Postfix) with ESMTP id 8E7F48FC23 for ; Wed, 30 Apr 2008 15:38:26 +0000 (UTC) (envelope-from bms@incunabulum.net) Received: from compute2.internal (compute2.internal [10.202.2.42]) by out1.messagingengine.com (Postfix) with ESMTP id 1F45910510D; Wed, 30 Apr 2008 11:38:25 -0400 (EDT) Received: from heartbeat2.messagingengine.com ([10.202.2.161]) by compute2.internal (MEProxy); Wed, 30 Apr 2008 11:38:25 -0400 X-Sasl-enc: nJ9yUwWj48UeJbSUgz7xvEuSUs2uRhVBRjUB1OkycHdj 1209569904 Received: from empiric.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTPSA id 342E72B569; Wed, 30 Apr 2008 11:38:24 -0400 (EDT) Message-ID: <4818926F.8010309@incunabulum.net> Date: Wed, 30 Apr 2008 16:38:23 +0100 From: Bruce M Simpson User-Agent: Thunderbird 2.0.0.12 (X11/20080423) MIME-Version: 1.0 To: Julian Elischer References: <20080429185100.57C2445010@ptavv.es.net> <4817743B.6090107@elischer.org> <48178452.4050700@FreeBSD.org> <4817881B.7010409@elischer.org> In-Reply-To: <4817881B.7010409@elischer.org> X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net , Kevin Oberman Subject: Re: multiple routing tables review patch ready for simple testing. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Apr 2008 15:38:26 -0000 Julian Elischer wrote: > An interface may however be present in entries from multiple FIBs > in which case the INCOMING packets on that interface need to > be disambiguated with respect to which FIB they belong to. Yes, there is no way the forwarding code alone can do this. It should not be expected to, and it's important to maintain a clean functional separation there, otherwise one ends up in the same quagmire which has been plaguing a lot of QoS research projects over the years (Where do I put this bit of the system?) > > This is a job for an outside entity (from the fibs). > In this case a packet classifier such as pf or ipfw is ideal > for the job. providing an outside mechanism for implementing > whatever policy the admin wants to set up. Absolutely. This has been the intent from the beginning. There is no "one size fits all" approach here. We could put a packet classifier into the kernel which works just fine for DOCSIS consumer distribution networks, but has absolutely no relevance to an ATM backbone (these are the two main flavours of access for folk in the UK). > > I find it is convenient to envision each routing FIB as a routing > plane, in a stack of such planes. Each plane may know about the same > interfaces or different interfaces. When a packet enters a routing > plane it is routed according to the internal rules of that plane. > Irrespective of how other planes may act. Each plane can only route > a packet to interfaces that are know about on that plane. > Incoming packets on an interface don't know what plane to go to > and must be told which to use by the external mechanism. It > IS possible that an interface in the future might have a default > plane, but I haven't implemented this. This limitation seems fine for now. Users can't be expected to configure the defaults "by default" if they aren't supported, so, if overall the VRF-like feature defaults to off, and there are big flashing bold letters saying "You must fully configure the forwarding plane mappings if you wish to use multiple FIBs", then that's fine by me. > > if you have several alias addresses on an interface it is possible > that some FIBS know about some of them and others know about other > addresses. New addresses when added are added to each FIB and > whatever is adding them shoudl remove them from the ones that don't > need it. This may change but it fits in with how the current code > works and keeps the diff to a manageable size. In any event, for plain old IP forwarding, a node's endpoint addresses are used only as convenient ways of referring to physical links. To back up and give this some detailed background: For example, 192.0.2.1/24 might be configured on fxp0, and we receive a packet on another interface for 192.0.2.2. When resolving a route, the forwarding code needs to do a lookup to see from where 192.0.2.2 is reachable before the next-hop is resolved in the table. That happens on a per-FIB basis, when the patches are applied -- however the job of tagging input for which FIB is the job of the classifier. The problems with the above approach begin when an input interface resides in multiple virtual FIBs (no 1:1 mapping), or when you can't refer to it by an address (it has no address -- unnumbered point-to-point link, or addresses do not apply), or when you attempt to implement encapsulation (e.g. GRE, IPIP) in the forwarding layer. Then, you're reliant on each individual FIB having resolved next-hops correctly. The existing forwarding code already does some of this by forcing the ifp to be set for any route added to the table. This is done implicitly for routes which transit point-to-point interfaces. BSD has had some weaknesses in this area. It makes implementing things like VRRP particularly difficult, which is why the ifnet approach to CARP was used (the forwarding table gets to see a single ifp); it eliminates a level of possible recursion from that layer of the routing stack. With multicast, for example, next-hops can't be identified by IPv4 addresses alone. Every forwarding decision has potentially more than one result, and links are referred to by physical link (this could be an ifp, an interface index, a name, whatever), and where messages are forwarded is determined using a link-scope protocol such as IGMP. There, it's reasonable to expect that the user partitioned off the multicast forwarding planes into separate virtual FIBs, and that the appropriate rules in the classifier are configured. For SSM, the key (S,G) match has to happen in the input classifier, if one is going to route flows OK using the multiple FIB feature -- the multicast routing daemons have to be aware of it, 'cuz you can't run a separate instance of PIM for every set of flows -- PIM is greedy per-link, a !1:1 mapping problem exists, PIM has no way of telling separate instances apart (no hierarchy in the form of e.g. OSPF areas, and even OSPF won't let you put a link in more than one area -- virtual links don't count!) This is so much whizzing in the wind without a new MROUTING implementation though, and hierarchical multicast routing is a project in of itself. To summarize: For now, the limitations of the system should be documented so that users don't inadvertently configure local forwarding loops, even for unicast traffic; with multicast, the amplification effect of misconfiguration is inherently more damaging to a network. The IPv4 address of an interface can't be used as an identifier for source routing -- there is no way of knowing that was the next-hop used by the last-hop, the information just ain't there -- so if you have the same input interfaces in multiple virtual FIBs, you need to double check the appropriate match rules are in place for the flows to go where you want them to go. > (and it suits what I need for work where a route manager daemon > knows to do this.) This is another reason why I maintain that RIB and FIB should have functional separation. It's unreasonable to expect the kernel to perform next-hop resolution on every route presented to it, beyond that which is required by the link layer (i.e. ARP, and that should be functionally separated too). Recursive resolution also demands stack space, and this is a scarce kernel resource. Of course, well behaved routers are engineered such that the recursion takes place at RIB level, where limits and policy can be more easily applied, and before the route is plumbed into the hardware TCAM (or software FIB). Don't try to make the kernel do your dirty laundry. cheers BMS P.S. I see you tweaked verify_path() to do the lookup in the numbered FIB. Cool. I should point out that for ad-hoc networks, the ability to turn off RPF/uRPF for multicast is needed as the routing domain is often NOT fully converged -- so the RPF checks normally present may discard legitimate traffic which hasn't been forwarded yet. An encapsulation is typically used to maintain forwarding state which is relevant to the particular topology in use.