From owner-freebsd-net  Wed Dec  4 14:34: 4 2002
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4789B37B401
	for <freebsd-net@FreeBSD.ORG>; Wed,  4 Dec 2002 14:34:01 -0800 (PST)
Received: from mail.sandvine.com (sandvine.com [199.243.201.138])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8E15043EA9
	for <freebsd-net@FreeBSD.ORG>; Wed,  4 Dec 2002 14:34:00 -0800 (PST)
	(envelope-from don@sandvine.com)
Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19)
	id <XJT6KCWF>; Wed, 4 Dec 2002 17:33:59 -0500
Message-ID: <FE045D4D9F7AED4CBFF1B3B813C85337010230FE@mail.sandvine.com>
From: Don Bowman <don@sandvine.com>
To: 'Chuck Swiger' <cswiger@mac.com>, Don Bowman <don@sandvine.com>
Cc: "'freebsd-net@freebsd.org'" <freebsd-net@FreeBSD.ORG>
Subject: RE: SO_DONTROUTE, arp's, ipfw fwd, etc
Date: Wed, 4 Dec 2002 17:33:50 -0500 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-net.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-net>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-net>
X-Loop: FreeBSD.org

> From: Chuck Swiger [mailto:cswiger@mac.com]
> On Wednesday, December 4, 2002, at 03:37  PM, Don Bowman wrote:
> [ ... ]
>  > These are isp-sized routers (complicated networks with different
> > peering points to other networks). Static routes don't work since
> > they are much too dynamic. Additionally, the widget which is
> > picking the traffic to send (like Cisco WCCP) is load-balancing,
> > so there's another striping of data going on.
> 
> Yes, but the complicated internal routes maintained within 
> those networks 
> isn't your problem if your machine or network isn't BGP 
> peering with them.

It is in the sense that I have to figure out which one to send
data back to. More than one of them may 'own' a source address
at a given time (for a TCP session).

> 
> > In the example diagram above, I might have a case where host 'A'
> > sends host 'B' two concurrent TCP sessions. These will both 
> transparently
> > arrive @ the BSD box, one via router1, one via router2. 
> Triangulation
> > breaks the application, so A->B(session1) needs to always flow via
> > the same router it started on.
> 
> Why?  This sounds like a pretty classic example of A being on 
> a multihomed 
> network, and you should let IP-level routing deal with the 
> problem.  But 
> there are alternatives, I guess-- maybe try putting a buncha 
> interfaces on 
> the BSD box, one for each router being connected to it, and 
> put each pair 
> on their own /30.  That way, the BSD box can quite easily return the 
> traffic back to the originating router....

Only if its routing, not for L2 redirection.

> 
> > I'm thinking this is achieved by just caching the interface 
> & destination
> > MAC etc in the PCB for the TCP session. It does this anyway once its
> > finished sending the SYN/ACK, its just that it follows 
> routing rules and
> > ARP's for the SYN/ACK.
> 
> Yes.  Pretending machines which are on remote networks are 
> local can be 
> done by re-writing MAC addresses, but that can be achieved by 
> NAT or VPN 
> solutions as well.  Why are you trying to override normal 
> routing behavior 
> when you probably can use it to help solve the problem?

This is a transparent proxy. The proxy needs to know where the
real destination was (in case it needs to open a connection there).
The HTTP protocol solved this by putting the real-ip address in the
header, but most other protocols didn't.
I don't have control of the content switching routers which feeds
this. They work the way they do.

Say for the sake of example you wished to load balance 2 farms 
of telnet servers. You had a device which picked off port 23,
and sent it to you without alterations. You would then look @ the
intended destination address, and pick the right group of telnet
servers, and send the data there. Now say that those devices themselves
where load-balanced. So if a user telneted twice to the same destination,
one path might go through the first redirector, and one through the
2nd. The path back is based on the path it came in.
              [client]
                |
          --------------------------
          | Load Balancer          |
          --------------------------
           |                       |
           |                       |
      [Redirector1]         [Redirector2]
            \                     /
             \                   /
             ---------------------
                 |        |
               [BSD1]   [BSD2]
                 |        |
                 -----------------------------
                  | | | | |         | | | | |
                Telnet servers(A)   Telnet (B)

So in this case, [client] sends a SYN to port 23 on the virtual address
of telnet(A). The load balancer sends this (and all other traffic)
aribtrarily to Redirector1 or 2. These devices say, Aha!, port 23, let
me use this clever policy based route, and just rewrite the destination
MAC to be either BSD1 or BSD2 (based on some feedback on their load,
availability, etc). BSD1 and 2 have a rule like:
 ipfw fwd localhost,9000 tcp from any to any recv bge0 23
and then on localhost:9000 have listening a clever little app that does:
 accept(), look @ intended destination IP, pick a telnet server in
the farm it so addresses, connect, and then proxy the accepted() connection
to the actively initiated one.

Now, BSD1 / 2 can't use Redirector1/2 as a default route, since they
will be treating them as equals. One of them sent the SYN packet,
I'd love the SYN/ACK to go back to the same one. I know the MAC it
came from, that's where the response should go.

Making it all layer 3 doesn't help me, then I don't have the intended
destination address. Additionally I have the problem that if I have
two routers on my net, and one sends me traffic, I can only respond
to it if its my default route, or if I have a static route for an
IP behind it. Maybe those routers both lead to the same locations?

I can't really use a VPN (GRE etc) tunnel since then I'll have to fragment,
and I'd prefer to avoid that. My first thought was to create a GRE
tunnel from Redirect(1,2) to BSD(1,2), and send the data down the tunnel
with the original info intact. Sadly, this extra 24-bytes of overhead
costs performance in adding/deleting, I lose the hardware assist I have
for IP/TCP checksum offload, and the intervening devices don't support
jumbo frames so I have to double the number of packets, sending a 1500
and then a 24 byte frame.

Thanks for the input, keep it coming! Talk me down from this ledge :)

--don (don@sandvine.com www.sandvine.com)

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message