Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Apr 1997 04:45:46 -0400
From:      "David S. Miller" <davem@jenolan.rutgers.edu>
To:        netdev@roxanne.nuclecu.unam.mx
Subject:   while I'm nursing a kernel compile or two...
Message-ID:  <199704110845.EAA00936@jenolan.caipgeneral>

next in thread | raw e-mail | index | archive | help

I still think socket demultiplexing can be done much better,
here are some things in my head:

1) Dynamic hash table growth at least for TCP.

   This is an old trick, would require a tcp_htbl_size to keep
   track of how big we are, and a function to rehash into the
   new table, it only grows and never shrinks.  It only grows by
   powers of two as well, the limitation on it's max size is based
   upon the amount of ram in the machine.  All straight forward
   stuff.

2) Slightly more intricate.

   Main bound hash is two level, second level hashes are only
   allocated and hooked into the top level when they are first
   needed but are never destroyed.  Destruction would require
   timers and a garbage collector, does not seem to be worth it.

3) A bit more hairy.

   View the Socket identity as a 96 bit key (as it really is)
   Use this and a DP trie and digital searches to look up sockets.
   The main reason this approach turns me on is that DP tries
   are specifically designed for longest matching prefix searches,
   in particular we can arrange the bits in the "big key" to have
   local port and local addr at the front, this makes listening sock
   lookup (which looks in this case like a default route ;-) much
   quicker even with huge numbers of connections.

   This scheme also can be proven to have a given response time with
   a given set of sockets.  DP trie's are specifically designed such
   that the layout of the tree is dependant upon "whats" in the tree
   not "how it got there"  That is, no matter the order of insertions
   and deletions, for that given set of sockets the trie will always
   be the same.

   One lose is that to do a lookup you have to put the socket identity
   elements from the header on the local stack to arrange the bits
   correctly next to each other for the search.  Also, although
   insertion and deletion are done in constant time, I am still not
   certain this is not a "considerable" amount of constant time
   (ie. too slow ;-)

4) Really far out (this is an ANK original ;-)

   Stick small hashes (perhaps 2 level) into the destination cache
   entries.  We need to get at the dst cache entry for all outgoing
   and incoming packets we care about anyways, thus it is very
   cheap to sprinkle the mini hash tables into there and look them
   up this way.

   Of the top of my head there is only one bad case, and unfortunately
   this is the case that the benchmarks tend to test.  A few machines
   (thus a small number of dst cache entries) making thousands upon
   thousands of connections to us.  This is because the mini hashes in
   the dst cache entries would be overloaded entirely.  It might be
   offset entirely if we used a two level scheme in the mini-hashes
   (8 entries top level, perhaps 16 in the second level).

   The only other problems this one might present is that a few extra
   dst cache lookups will occur when we need to do a lookup and we
   have no other reason (currently) to have the dst entry handy
   already.

For 3 and 4 the TCP bound hash would need to remain as I cannot think
of any other way to perform those operations more efficiently.

B tree's looks interesting for this application, but I still have to
study some of the worse/average case analysis for those, the problem
with socket lookups is that you hit the worse case during these high
stress situations.  This is why hashes are so appealing for socket
demultiplexing, "are the chains getting too long?  just make the table
bigger" etc.

Just a brain dump, continue hacking...

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199704110845.EAA00936>