FreeBSD Mail Archives

Date:      Sun, 19 Apr 2009 10:13:37 -0700
From:      Kip Macy <kmacy@freebsd.org>
To:        Andre Oppermann <andre@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Robert Watson <rwatson@freebsd.org>
Subject:   Re: svn commit: r191259 - head/sys/netinet
Message-ID:  <3c1674c90904191013h119d040u1c59772a94dad2f1@mail.gmail.com>
In-Reply-To: <49EAFA62.3010000@freebsd.org>
References:  <200904190444.n3J4i5wF098362@svn.freebsd.org> <alpine.BSF.2.00.0904191017350.21859@fledge.watson.org> <49EAFA62.3010000@freebsd.org>

On Sun, Apr 19, 2009 at 3:18 AM, Andre Oppermann <andre@freebsd.org> wrote:
> Robert Watson wrote:
>>
>> On Sun, 19 Apr 2009, Kip Macy wrote:
>>
>>> Author: kmacy
>>> Date: Sun Apr 19 04:44:05 2009
>>> New Revision: 191259
>>> URL: http://svn.freebsd.org/changeset/base/191259
> I have another question on the flowtable: �What is the pupose of it?
> All router vendors have learned a long time ago that route caching
> (aka flow caching) doesn't work out on a router that carries the DFZ
> (default free zone, currently ~280k prefixes). �The overhead of managing
> the flow table and the high churn rate make it much more expensive than
> a direct and already very efficient radix trie lookup. Additionally a
> well connected DFZ router has some 1k prefix updates per second. �More
> information can be found for example at Cisco here:
> �http://www.cisco.com/en/US/tech/tk827/tk831/technologies_white_paper09186a00800a62d9.shtml
> The same findings are also available from all other major router vendors
> like Juniper, Foundry, etc.
>
> Lets examine the situations:
> �a) internal router with only a few routes; The routing and ARP table
> � �are small, lookups are very fast and everything is hot in the CPU
> � �caches anyway.
> �b) DFZ router with 280k routes; A small flow table has constant thrashing
> � �becoming negative overhead only. �A large flow table has a high
> maintenance
> � �overhead, higher lookup times and sill a significant amount of thrashing.
> � �The overhead of the flow table is equal or higher than a direct routing
> � �table lookup.
> Concluding that a flow table is never a win but a liability in any realistic
> setting.

You're assuming that a Cisco- / Juniper-class workload is
representative of where FreeBSD is deployed. I agree that FreeBSD is
sub-optimal for large routing environments for a whole host of other
reasons. A better question is what are "typical" FreeBSD deployments,
and how well would it work there. The flowtable needs to be sized to
correspond to the number of flows, its utility rapidly diminishes as
the number of collisions per bucket increases. The number of routes
isn't the key metric, it is the number of flows active within a 30
second period. On current hardware we probably could not handle more
than a couple of million concurrent flows (with a 4 million entry hash
table).

> Now I don't have benchmark numbers to back up the theory I put forth here.
> However I want to bring up the rationale for why nobody else is doing it.
> A statistical analysis easily shows that flow caching has only a few small
> spots where it may offer some advantage over direct routing table lookups;
> none of them are where it matter in real work situations.

I can't argue with you, because you have not adequately characterize
"real" work situations. I know that it is useful for the commercial
environments with which I am familiar.

> As our kernel currently stands an advantage of the flow table can certainly
> be demonstrated for a small routing table and a small number of flows. �This
> is due to a very sub-optimal routing table implementation we have. �The flow
> table approach short-cuts a significant number of locking operations
> (routing
> table, routing entries, ARP table and possibly some more). �On the other
> hand
> this caching of flows and pointers to routing entries and ARP entries
> complicates
> updates to these tables and potentially makes them very expensive.

Incorrect. The implementation of the routing and arp tables are
unchanged with the inclusion of the flowtable. Any complexity in their
implementations is completely decoupled from the flowtable. If their
implementations change, the flowtable will follow suit.

> �Additionally
> is creates a "tangled mess" again complicating future changes and advances
> in
> those areas (unless the flow table were simply removed again at that point).

The two will remain separate, please do no confuse matters.

> I argue that instead of cludging around (the flow table) a sub-optimal part
> of the network stack (the current incarnation of the routing table) time
> could
> be equally spent wiser on fixing the problems in the first place. �I've
> outlined
> a few approaches a couple of times before on the mailing lists. �If the
> routing
> table would no longer support direct pointers to entries the locking could
> be
> significantly simplified and the ARP table could use rmlocks (read-mostly
> locks)
> as it is changed only very infrequently. �It's all about the number of locks
> that
> have to be aquired per packet/lookup. �It also has the benefit of an order
> of a
> magnitude less complexity (and hard to debug egde cases, which cannot be
> under-
> estimated).

In principle the ARP table could use rmlocks now. For the routing
table you thing we should copy the rtentry out? I agree that the
locked ref counting of rtentrys has ridiculously high overhead and
would love to see that go away. The one major concern that I had when
looking at doing that was the need to ensure continued liveness of the
structures pointed to by the rtentry.

-Kip

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c1674c90904191013h119d040u1c59772a94dad2f1>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation