From owner-svn-src-head@FreeBSD.ORG Sun Apr 19 20:38:40 2009 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3BAE21065670; Sun, 19 Apr 2009 20:38:40 +0000 (UTC) (envelope-from zec@freebsd.org) Received: from labs4.cc.fer.hr (labs4.cc.fer.hr [161.53.72.24]) by mx1.freebsd.org (Postfix) with ESMTP id 996C48FC24; Sun, 19 Apr 2009 20:38:39 +0000 (UTC) (envelope-from zec@freebsd.org) Received: from sluga.fer.hr (sluga.cc.fer.hr [161.53.72.14]) by labs4.cc.fer.hr (8.14.2/8.14.2) with ESMTP id n3JKMMj1021109; Sun, 19 Apr 2009 22:22:22 +0200 (CEST) Received: from [192.168.200.110] ([161.53.19.79]) by sluga.fer.hr over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Sun, 19 Apr 2009 22:21:59 +0200 From: Marko Zec To: Kip Macy Date: Sun, 19 Apr 2009 22:21:55 +0200 User-Agent: KMail/1.9.10 References: <200904190444.n3J4i5wF098362@svn.freebsd.org> <49EAFA62.3010000@freebsd.org> <3c1674c90904191013h119d040u1c59772a94dad2f1@mail.gmail.com> In-Reply-To: <3c1674c90904191013h119d040u1c59772a94dad2f1@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200904192221.55744.zec@freebsd.org> X-OriginalArrivalTime: 19 Apr 2009 20:21:59.0969 (UTC) FILETIME=[7E5FA910:01C9C12C] X-Scanned-By: MIMEDefang 2.64 on 161.53.72.24 Cc: svn-src-head@freebsd.org, Robert Watson , svn-src-all@freebsd.org, src-committers@freebsd.org, Andre Oppermann Subject: Re: svn commit: r191259 - head/sys/netinet X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Apr 2009 20:38:40 -0000 On Sunday 19 April 2009 19:13:37 Kip Macy wrote: > On Sun, Apr 19, 2009 at 3:18 AM, Andre Oppermann wrot= e: =2E.. > > I have another question on the flowtable: =A0What is the pupose of it? > > All router vendors have learned a long time ago that route caching > > (aka flow caching) doesn't work out on a router that carries the DFZ > > (default free zone, currently ~280k prefixes). =A0The overhead of manag= ing > > the flow table and the high churn rate make it much more expensive than > > a direct and already very efficient radix trie lookup. Additionally a > > well connected DFZ router has some 1k prefix updates per second. =A0More > > information can be found for example at Cisco here: > > =A0http://www.cisco.com/en/US/tech/tk827/tk831/technologies_white_paper= 0918 > >6a00800a62d9.shtml The same findings are also available from all other > > major router vendors like Juniper, Foundry, etc. > > > > Lets examine the situations: > > =A0a) internal router with only a few routes; The routing and ARP table > > =A0 =A0are small, lookups are very fast and everything is hot in the CPU > > =A0 =A0caches anyway. > > =A0b) DFZ router with 280k routes; A small flow table has constant > > thrashing becoming negative overhead only. =A0A large flow table has a = high > > maintenance > > =A0 =A0overhead, higher lookup times and sill a significant amount of > > thrashing. The overhead of the flow table is equal or higher than a > > direct routing table lookup. > > Concluding that a flow table is never a win but a liability in any > > realistic setting. > > You're assuming that a Cisco- / Juniper-class workload is > representative of where FreeBSD is deployed. I agree that FreeBSD is > sub-optimal for large routing environments for a whole host of other > reasons. A better question is what are "typical" FreeBSD deployments, > and how well would it work there. The flowtable needs to be sized to > correspond to the number of flows, its utility rapidly diminishes as > the number of collisions per bucket increases. =2E.. which makes a flow cache a perfect DoS target in any environment, be = it a=20 DFZ or enterprise router or an end host or whatever. Marko > The number of routes=20 > isn't the key metric, it is the number of flows active within a 30 > second period. On current hardware we probably could not handle more > than a couple of million concurrent flows (with a 4 million entry hash > table).