Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Apr 2009 10:13:37 -0700
From:      Kip Macy <kmacy@freebsd.org>
To:        Andre Oppermann <andre@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Robert Watson <rwatson@freebsd.org>
Subject:   Re: svn commit: r191259 - head/sys/netinet
Message-ID:  <3c1674c90904191013h119d040u1c59772a94dad2f1@mail.gmail.com>
In-Reply-To: <49EAFA62.3010000@freebsd.org>
References:  <200904190444.n3J4i5wF098362@svn.freebsd.org> <alpine.BSF.2.00.0904191017350.21859@fledge.watson.org> <49EAFA62.3010000@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Apr 19, 2009 at 3:18 AM, Andre Oppermann <andre@freebsd.org> wrote:
> Robert Watson wrote:
>>
>> On Sun, 19 Apr 2009, Kip Macy wrote:
>>
>>> Author: kmacy
>>> Date: Sun Apr 19 04:44:05 2009
>>> New Revision: 191259
>>> URL: http://svn.freebsd.org/changeset/base/191259
> I have another question on the flowtable: =A0What is the pupose of it?
> All router vendors have learned a long time ago that route caching
> (aka flow caching) doesn't work out on a router that carries the DFZ
> (default free zone, currently ~280k prefixes). =A0The overhead of managin=
g
> the flow table and the high churn rate make it much more expensive than
> a direct and already very efficient radix trie lookup. Additionally a
> well connected DFZ router has some 1k prefix updates per second. =A0More
> information can be found for example at Cisco here:
> =A0http://www.cisco.com/en/US/tech/tk827/tk831/technologies_white_paper09=
186a00800a62d9.shtml
> The same findings are also available from all other major router vendors
> like Juniper, Foundry, etc.
>
> Lets examine the situations:
> =A0a) internal router with only a few routes; The routing and ARP table
> =A0 =A0are small, lookups are very fast and everything is hot in the CPU
> =A0 =A0caches anyway.
> =A0b) DFZ router with 280k routes; A small flow table has constant thrash=
ing
> =A0 =A0becoming negative overhead only. =A0A large flow table has a high
> maintenance
> =A0 =A0overhead, higher lookup times and sill a significant amount of thr=
ashing.
> =A0 =A0The overhead of the flow table is equal or higher than a direct ro=
uting
> =A0 =A0table lookup.
> Concluding that a flow table is never a win but a liability in any realis=
tic
> setting.

You're assuming that a Cisco- / Juniper-class workload is
representative of where FreeBSD is deployed. I agree that FreeBSD is
sub-optimal for large routing environments for a whole host of other
reasons. A better question is what are "typical" FreeBSD deployments,
and how well would it work there. The flowtable needs to be sized to
correspond to the number of flows, its utility rapidly diminishes as
the number of collisions per bucket increases. The number of routes
isn't the key metric, it is the number of flows active within a 30
second period. On current hardware we probably could not handle more
than a couple of million concurrent flows (with a 4 million entry hash
table).


> Now I don't have benchmark numbers to back up the theory I put forth here=
.
> However I want to bring up the rationale for why nobody else is doing it.
> A statistical analysis easily shows that flow caching has only a few smal=
l
> spots where it may offer some advantage over direct routing table lookups=
;
> none of them are where it matter in real work situations.

I can't argue with you, because you have not adequately characterize
"real" work situations. I know that it is useful for the commercial
environments with which I am familiar.


> As our kernel currently stands an advantage of the flow table can certain=
ly
> be demonstrated for a small routing table and a small number of flows. =
=A0This
> is due to a very sub-optimal routing table implementation we have. =A0The=
 flow
> table approach short-cuts a significant number of locking operations
> (routing
> table, routing entries, ARP table and possibly some more). =A0On the othe=
r
> hand
> this caching of flows and pointers to routing entries and ARP entries
> complicates
> updates to these tables and potentially makes them very expensive.

Incorrect. The implementation of the routing and arp tables are
unchanged with the inclusion of the flowtable. Any complexity in their
implementations is completely decoupled from the flowtable. If their
implementations change, the flowtable will follow suit.

> =A0Additionally
> is creates a "tangled mess" again complicating future changes and advance=
s
> in
> those areas (unless the flow table were simply removed again at that poin=
t).

The two will remain separate, please do no confuse matters.


> I argue that instead of cludging around (the flow table) a sub-optimal pa=
rt
> of the network stack (the current incarnation of the routing table) time
> could
> be equally spent wiser on fixing the problems in the first place. =A0I've
> outlined
> a few approaches a couple of times before on the mailing lists. =A0If the
> routing
> table would no longer support direct pointers to entries the locking coul=
d
> be
> significantly simplified and the ARP table could use rmlocks (read-mostly
> locks)
> as it is changed only very infrequently. =A0It's all about the number of =
locks
> that
> have to be aquired per packet/lookup. =A0It also has the benefit of an or=
der
> of a
> magnitude less complexity (and hard to debug egde cases, which cannot be
> under-
> estimated).

In principle the ARP table could use rmlocks now. For the routing
table you thing we should copy the rtentry out? I agree that the
locked ref counting of rtentrys has ridiculously high overhead and
would love to see that go away. The one major concern that I had when
looking at doing that was the need to ensure continued liveness of the
structures pointed to by the rtentry.

-Kip



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c1674c90904191013h119d040u1c59772a94dad2f1>