From owner-freebsd-stable@FreeBSD.ORG  Fri Feb 25 23:13:43 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 03E1B1065670
	for <freebsd-stable@freebsd.org>; Fri, 25 Feb 2011 23:13:43 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id C32128FC17
	for <freebsd-stable@freebsd.org>; Fri, 25 Feb 2011 23:13:42 +0000 (UTC)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.4/8.14.1) with ESMTP id p1PMxi4K019791
	for <freebsd-stable@freebsd.org>; Fri, 25 Feb 2011 14:59:44 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.4/8.13.4/Submit) id p1PMxiB5019790;
	Fri, 25 Feb 2011 14:59:44 -0800 (PST)
Date: Fri, 25 Feb 2011 14:59:44 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <201102252259.p1PMxiB5019790@apollo.backplane.com>
Cc: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
References: <AANLkTi=P6pbiPHWpeoj9Os+fi76Hk7DFOyYaSN3BY=_J@mail.gmail.com>
	<09E86832-F5D9-4415-83A0-FEF59693FE02@gsoft.com.au>
Subject: Re: How to bind a static ether address to bridge?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 25 Feb 2011 23:13:43 -0000

    If you can swing a routed network that will definitely have the fewest
    complications.

    For a switched network if_bridge and ARP have to be integrated, something
    I just finished doing in DragonFly, so that all member interfaces of the
    bridge use *only* the bridge's MAC for all transactions, including ARP
    transactions, whether they require forwarding through the bridge or not.

    The bridge has its own internal forwarding table and a great deal of
    confusion occurs if the normal ARP code is trying to tie into individual
    interfaces instead of just the bridge interface, for *ANY* member of
    the bridge, not just the first member of the bridge.

    Some of the problems you are likely to hit using if_bridge:

    * ARP response flows in on member interface A with an ether destination
      of member interface B.  OS decides to record the ARP route as coming
      from interface B (when it's actually coming from interface A),
      while the bridge internally records the proper forwarding (A).
      Fireworks ensue.

    * ARP responses targetting member interfaces which are part of the
      spanning tree protocol (when you have redundant links), and then
      wind up in the blocking state by the spanning tree protocol.

      The if_bridge code in FreeBSD sets the bridge's MAC to be the
      same as the first added interface, which is usually your LAN
      ethernet port.  This will help a bit, just make sure that it *IS*
      your LAN ethernet port and that the spanning tree protocol is *NOT*
      turned on for that port.

      However, other member interfaces (usually TAPs if you are using
      something like OpenVPN) will have different MAC addresses and that
      will cause confusion.

  It might be possible to work around both issues by setting the MAC for
  *ALL* member interfaces to be the same as the bridge MAC, but I don't
  know.  I gave up trying to do that in DFly and instead modified the ARP
  code to always use the bridge MAC for any interface which is a member of
  a bridge.  That appears to have worked quite well.

  My home network (using DragonFly) is using if_bridge to a colocated box,
  ether bridging a class C over three WANs via OpenVPN, with the related
  TAP interfaces and the LAN interface as members of the bridge.  The
  bridge is set up with the spanning tree protocol turned on for the three
  TAP interfaces and with bonding turned on for two of the TAP interfaces.
  But that's with DFly (and I just finished the work two days ago).
  If something similar cannot be done w/FreeBSD then I recommend porting
  the changes from DFly over to FreeBSD's bridging and ARP modules.

  It was a big headache but once I cleared up the ARP confusion things just
  started magically working.

  Other caveats:

    * TAP and BRIDGE interfaces are assigned a nearly random MAC address
      when they are created (in FreeBSD the bridge sets its MAC to the
      first member interface so that is at least ok if you always add your
      LAN as the first member interface, however the other member interfaces
      aren't so lucky).  Rebooting the machine containing the bridge or
      destroying and rebuilding the bridge can create total and absolute
      havoc on your network because the rest of your switching
      infrastructure and machines will have the old MACs cached.

      The partial solution is taking on the MAC address of the LAN interface,
      which FreeBSD's bridging code does, and it might be possible to also
      set the other member interfaces to that same MAC (but I don't know if
      that will work).  If not then this is almost a non-solvable problem
      short of making the ARP module more aware of the bridge.

    * If using redundant links without bonding support in the bridge code
      the bridge itself will get confused when the topology changes, though
      if it is a simple topology the bridge should be able to start forwarding
      to the backup link even though its internal forwarding table is messed
      up.

      The concept of a 'backup' link is a bit of a hack in the STP code
      (just as the concept of 'bonding' is a bit of a hack), so how well it
      works will depend on a lot of different factors.  The idea of a
      'backup' link is to be able to continue to switch packets when only
      one path is available even if that path has not been completely
      resolved through the STP protocol.

    * ARP only works because *EVERYONE* uses the same timeout.  Futzing
      around with member associations on the bridge will cause the bridge
      to forget.  The bridge should theoretically broadcast unicast packets
      for which it doesn't have a forwarding entry but... well, it is still
      possible for machines to get confused.

      When working on your setup you may have to 'arp -d -a' on one or
      more machines multiple times to force them to re-arp and cause all
      your intermediate ethernet switches to relearn the new MACs.  Remember
      that your ethernet switches can get just as confused as your actual
      machines!  'why can't I see that packet going over my LAN, both my
      machines have the correct ARP entries!!!!'... but the little hardware
      ether switch between them might not.

    * A multi-homed network can sometimes have routing loops, particularly
      when you try to use an ethernet bridge.

      For example lets say you have a machine on your home network using
      address IPA which sends a packet to a machine out in the world over
      the wrong default route.  The RESPONSE to that packet, sent *to*
      your machine, if it isn't blocked by edge routers (due to the source
      address being wrong for that edge) will come back through a DIFFERENT
      bridge member.  In a switched network if the packet was destined to
      a machine directly on the other side of the bridge which is part of
      the switched network, the machines on the other side of the bridge
      may end up believing that IPA is accessed via the other direction
      instead of through the VPN/bridge.  Needless to say, trying to route
      a response back to IPA through the remote side's default route instead
      of through the VPN directly to IPA may get blackholed or, worse,
      may end up creating a loop.

      The machines on the bridged network will get confused as to which
      direction to go to get to the machine with IPA.

    So, lots of horror is possible here.  If you can use a routed network
    instead of a bridged network that's really what you want to do.  On
    the otherhand, routed networks cannot handle channel bonding and
    redundancy (even if using BGP or OSPF for your internal network)
    nearly as well as switched networks can.  If the bridge interface and ARP
    code is brought up to snuff it actually will do the job quite nicely.

    One last note on using a switched network and something like VPN.  You
    will end up with multiple default routes and need to use IPFW2 to make
    sure that packets with source IPs for various WAN interfaces are forwarded
    through the correct default route.  The 'master' default route for the
    machine would normally be set to the default route for the bridged
    network.

    Throwing NAT on top of everything else adds more fun to the pot, sometimes
    there isn't a clear distinction as to when a packet goes from being
    'switched' to being 'routed', particularly when something like NAT
    bounces a packet back out the same interface it came in on.  Essentially
    any address translation which occurs (NAT) takes the packet out of the
    switching path and places it into the routing path.  The IPFW and PF
    tie-ins have to do their job so the rest of the system knows whether
    the packet filter 'ate' the packet (turning it into a routed packet),
    or simply filtered and returned it (leaving it as a switched packet).

						-Matt