From owner-freebsd-net Wed Dec 4 14:34: 4 2002 Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4789B37B401 for ; Wed, 4 Dec 2002 14:34:01 -0800 (PST) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8E15043EA9 for ; Wed, 4 Dec 2002 14:34:00 -0800 (PST) (envelope-from don@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19) id ; Wed, 4 Dec 2002 17:33:59 -0500 Message-ID: From: Don Bowman To: 'Chuck Swiger' , Don Bowman Cc: "'freebsd-net@freebsd.org'" Subject: RE: SO_DONTROUTE, arp's, ipfw fwd, etc Date: Wed, 4 Dec 2002 17:33:50 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org > From: Chuck Swiger [mailto:cswiger@mac.com] > On Wednesday, December 4, 2002, at 03:37 PM, Don Bowman wrote: > [ ... ] > > These are isp-sized routers (complicated networks with different > > peering points to other networks). Static routes don't work since > > they are much too dynamic. Additionally, the widget which is > > picking the traffic to send (like Cisco WCCP) is load-balancing, > > so there's another striping of data going on. > > Yes, but the complicated internal routes maintained within > those networks > isn't your problem if your machine or network isn't BGP > peering with them. It is in the sense that I have to figure out which one to send data back to. More than one of them may 'own' a source address at a given time (for a TCP session). > > > In the example diagram above, I might have a case where host 'A' > > sends host 'B' two concurrent TCP sessions. These will both > transparently > > arrive @ the BSD box, one via router1, one via router2. > Triangulation > > breaks the application, so A->B(session1) needs to always flow via > > the same router it started on. > > Why? This sounds like a pretty classic example of A being on > a multihomed > network, and you should let IP-level routing deal with the > problem. But > there are alternatives, I guess-- maybe try putting a buncha > interfaces on > the BSD box, one for each router being connected to it, and > put each pair > on their own /30. That way, the BSD box can quite easily return the > traffic back to the originating router.... Only if its routing, not for L2 redirection. > > > I'm thinking this is achieved by just caching the interface > & destination > > MAC etc in the PCB for the TCP session. It does this anyway once its > > finished sending the SYN/ACK, its just that it follows > routing rules and > > ARP's for the SYN/ACK. > > Yes. Pretending machines which are on remote networks are > local can be > done by re-writing MAC addresses, but that can be achieved by > NAT or VPN > solutions as well. Why are you trying to override normal > routing behavior > when you probably can use it to help solve the problem? This is a transparent proxy. The proxy needs to know where the real destination was (in case it needs to open a connection there). The HTTP protocol solved this by putting the real-ip address in the header, but most other protocols didn't. I don't have control of the content switching routers which feeds this. They work the way they do. Say for the sake of example you wished to load balance 2 farms of telnet servers. You had a device which picked off port 23, and sent it to you without alterations. You would then look @ the intended destination address, and pick the right group of telnet servers, and send the data there. Now say that those devices themselves where load-balanced. So if a user telneted twice to the same destination, one path might go through the first redirector, and one through the 2nd. The path back is based on the path it came in. [client] | -------------------------- | Load Balancer | -------------------------- | | | | [Redirector1] [Redirector2] \ / \ / --------------------- | | [BSD1] [BSD2] | | ----------------------------- | | | | | | | | | | Telnet servers(A) Telnet (B) So in this case, [client] sends a SYN to port 23 on the virtual address of telnet(A). The load balancer sends this (and all other traffic) aribtrarily to Redirector1 or 2. These devices say, Aha!, port 23, let me use this clever policy based route, and just rewrite the destination MAC to be either BSD1 or BSD2 (based on some feedback on their load, availability, etc). BSD1 and 2 have a rule like: ipfw fwd localhost,9000 tcp from any to any recv bge0 23 and then on localhost:9000 have listening a clever little app that does: accept(), look @ intended destination IP, pick a telnet server in the farm it so addresses, connect, and then proxy the accepted() connection to the actively initiated one. Now, BSD1 / 2 can't use Redirector1/2 as a default route, since they will be treating them as equals. One of them sent the SYN packet, I'd love the SYN/ACK to go back to the same one. I know the MAC it came from, that's where the response should go. Making it all layer 3 doesn't help me, then I don't have the intended destination address. Additionally I have the problem that if I have two routers on my net, and one sends me traffic, I can only respond to it if its my default route, or if I have a static route for an IP behind it. Maybe those routers both lead to the same locations? I can't really use a VPN (GRE etc) tunnel since then I'll have to fragment, and I'd prefer to avoid that. My first thought was to create a GRE tunnel from Redirect(1,2) to BSD(1,2), and send the data down the tunnel with the original info intact. Sadly, this extra 24-bytes of overhead costs performance in adding/deleting, I lose the hardware assist I have for IP/TCP checksum offload, and the intervening devices don't support jumbo frames so I have to double the number of packets, sending a 1500 and then a 24 byte frame. Thanks for the input, keep it coming! Talk me down from this ledge :) --don (don@sandvine.com www.sandvine.com) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message