From owner-freebsd-net@FreeBSD.ORG Fri Sep 7 22:57:16 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D8B716A420 for ; Fri, 7 Sep 2007 22:57:16 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 09C3813C428 for ; Fri, 7 Sep 2007 22:57:11 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 41467 invoked from network); 7 Sep 2007 22:44:17 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 7 Sep 2007 22:44:17 -0000 Message-ID: <46E1D74D.3070409@freebsd.org> Date: Sat, 08 Sep 2007 00:57:17 +0200 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.13 (Windows/20070809) MIME-Version: 1.0 To: Bakul Shah References: <20070907200809.CB6B05B58@mail.bitblocks.com> In-Reply-To: <20070907200809.CB6B05B58@mail.bitblocks.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, Kirc Gover Subject: Re: OS choice for an edge router X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Sep 2007 22:57:16 -0000 Bakul Shah wrote: >> One of my concern is on the native forwarding capability of FreeBSD OS and the >> execution of critical userland processes. I have experience before that a >> FreeBSD box configured as router appears to slow down the userland processes >> when the traffic load is high. I have verified this lately on 6.1, running on >> Athlon64 with 1G NIC cards with PF and ALTQ (queuing) enabled. I'm not so sure >> if this is caused by PF or ALTQ. Looking at the processes using top, it could >> see that "swi(x) net" process is almost as near 100% cpu utilization. At this >> state, the box can still forward traffic (not sure yet if the was a change in >> throughput) but I could notice the userland processes to be very slow, like >> invoking any command from the shell (e.g. ls) will take so long to be executed >> and completed. Is this a know limitation or bug? All software switched platforms have this problem. You've got only so much CPU power and it is shared between forwarding packets and running userland processes. Even the popular Cisco 7200, 7300, 3800, 2800, 1800 and 800 Series routers suffer from it. All of them break down quite hard under load and advanced features like ACLs, QoS and all other stuff. The only way around it a split forwarding and control plane where the forwarding is done on dedicated ASICs or FPGAs. Examples are the Cisco 7600, 12000 and CRS-1 Series. At Juniper it's all the M Series routers. At Foundry Networks it's the Net- and BigIron Series. However this dedicated hardware has its cost and isn't available on commodity platforms. > On my Athlon64 X2 3800+ box I get about 42,500 pings a second > (with ping -q -f localhost, after removing icmp limits). For > forwarding, you won't get a packets per second (pps) number > significantly higher than that. Less if you do PF, ALTQ or > VPNs. This is not the case. Flood ping doesn't reach the limit in any way. Have a look at the ping man page and flood ping description. > The worst case for 1GBps is about 1.488 Million pps. Even if > you assume there is one ack packet for one data pacet of max > size, you have over 150K pps when the full 1GBps is used up. > I just don't think stock *BSD will do this for you on a > typical el cheapo PC box. Neither will Linux or vmware. Stock FreeBSD 6.2 or 7.0 can easily do 500kpps with good network cards and fastforwarding enabled. On a dual-Opteron 2.6GHz with PCI-X Intel and Broadcom network cards I've done 800kpps in-out. Just sinking 2.88Mpps into the box (into a blackhole route) takes about 60% CPU at zero unintended packet loss. Throughput in terms of bandwidth is an non-issue. Bus bandwidth on PCI-X and PCI-E is plenty and even at 500kpps you can push more than 5Gbit/s at 1500 byte packets. Of course there is a mix of small and large packets and at 500kpps the actual peak throughput is closer to 2Gbit/s. All measured with a calibrated Agilent N2X packet generator. The forwarding bandwidth is only limited by the pps rate (times packet size or size mix). Firewalling may have some non-trivial impact on forwarding performance depending on the number and structure of the rules. In general ipfw has quite a bit less impact than pf. A rule of thumb is that about 100 ipfw rules cost 50% of the peak pps performance. Doing ALTQ on a box with GigE ports is not useful and only wastes pps performance. As long as there is no queuing going on ALTQ doesn't provide any benefits and only burns CPU. ALTQ isn't really optimized on FreeBSD and the performance impact is probably significant, although I don't have any hard numbers on it. > Listen to what Louis Mamakos said! Use FreeBSD primarily for > the control plane. May be there are NICs where you can > offload some packet forwarding.... But that is a substantial > change to FreeBSD. Or live with what FreeBSD can do on a > given box. There are no NICs known that can do packet forwarding offload. And neither is there support in FreeBSD for that. You're probably confusing this with checksum offloading or TSO (TCP segmentation offloading) which isn't an issue with packet forwarding at all. > For a project like yours, any Open Source code you get is a > *starting point* and no more. Your customers don't care if > it is FreeBSD or CP/M; they just expect your router to work > and when it doesn't they expect you to fix it pronto. This > means you have to do QA, find and fix bugs, add missing > functionality and so on. You can't wait for FreeBSD > volunteers to fix any problems; you will have to fix any OS > panics, deadlocks and inefficiencies! I'm running all my routing on FreeBSD since about 1998. No problems and much more reliable than the countless Cisco IOS versions that have been deprecated since then. On any more recent platform or new line card you have to run IOS T versions which is most of the time is much worse than running FreeBSD-current on a production machine. It's probably cheaper to pay FreeBSD developers to fix any issues you find or run into than to pay Cisco for the pretty much mandatory service contract where any useful level starts at some 14% annually of the purchase price. And even then you have to pay for TAC cases and you are last in the queue relative to all others who pay more. Even when SQL slammer hit my network I didn't notice. The FreeBSD routers kept up while the Ciscos left and right (at other ISPs) fell over. OTOH I'm only doing plain vanilla BGP routing with a number of full feeds (about 230k routes at the moment, I'm part of the DFZ). My AS is a genuine AS8271. Router uptime of more than 600 days. Stuff just works, never have to worry about it. No sleep lost. > May be you can start with requirements. How many VPNs, how > much bandwidth, required pps, other performance/latency > requirements, protocols (and specific features in them) you > *must* have, protocols you may like to have, required CPU > bandwidth for running your proprietary services, some idea of > how these numbers will grow over the next three years, what > applications you may wish to run, what cost of goods you can > afford, etc. That ought to help you decide what h/w platform > is suitable or which requirement must go :-) Can't comment on VPN or IPSEC stuff. Never used that to any significant extent. However keep in mind that for the price of a single high powered Cisco or Juniper you can buy a very large number of also quite well powered FreeBSD powered routers. My recommendation for a optimal FreeBSD based router is as follows: CPU Core2 Duo or Athlon 64X2, more cores don't help in any way. One core can take the interrupts and one can continue to serve userland. A quality mainboard from Tyan, Supermicro or Intel with PCI-Express. A number of (dual-port) Intel Gigabit PCI-E network cards. Some two GB of RAM and a flash based ATA or SATA harddisk. Good case, redundant power supplies, good fans and otherwise no movable parts. Don't try RAID1 or stuff like that, causes more problems than it solves. Go for a single flash disk that is replaceable without having to disassemble the entire case. There are some 3.5" based flash disks on the market or buy a CF to ATA adapter for mounting into a 3.5" disk slot and use normal but fast CF cards. That'll do it. -- Andre