From owner-freebsd-hackers  Wed Jan 22 06:32:57 1997
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id GAA06843
          for hackers-outgoing; Wed, 22 Jan 1997 06:32:57 -0800 (PST)
Received: from pdx1.world.net (pdx1.world.net [192.243.32.18])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id GAA06837
          for <hackers@freebsd.org>; Wed, 22 Jan 1997 06:32:55 -0800 (PST)
From: proff@suburbia.net
Received: from suburbia.net (suburbia.net [203.4.184.1]) by pdx1.world.net (8.7.5/8.7.3) with SMTP id GAA04095 for <hackers@freebsd.org>; Wed, 22 Jan 1997 06:34:03 -0800 (PST)
Received: (qmail 16799 invoked by uid 110); 22 Jan 1997 14:32:31 -0000
Message-ID: <19970122143231.16798.qmail@suburbia.net>
Subject: Re: socket lookups: a proposal (fwd)
To: hackers@freebsd.org
Date: Thu, 23 Jan 1997 01:32:31 +1100 (EST)
X-Mailer: ELM [version 2.4ME+ PL28 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>From owner-netdev@roxanne.nuclecu.unam.mx Wed Jan 22 10:23:26 1997
Return-Path: <owner-netdev@roxanne.nuclecu.unam.mx>
Delivered-To: proff@suburbia.net
Received: (qmail 10048 invoked from network); 22 Jan 1997 10:23:13 -0000
Received: from roxanne.nuclecu.unam.mx (132.248.29.2)
  by suburbia.net with SMTP; 22 Jan 1997 10:23:13 -0000
Received: (from root@localhost) by roxanne.nuclecu.unam.mx (8.6.12/8.6.11) id DAA19506 for netdev-outgoing; Wed, 22 Jan 1997 03:55:53 -0600
Received: from zwei.siemens.at (zwei.siemens.at [193.81.246.12]) by roxanne.nuclecu.unam.mx (8.6.12/8.6.11) with ESMTP id DAA19501 for <netdev@roxanne.nuclecu.unam.mx>; Wed, 22 Jan 1997 03:55:45 -0600
Received: from pc5829.hil.siemens.at (root@[10.1.143.100]) by zwei.siemens.at (8.7.5/8.7.3) with ESMTP id KAA01806; Wed, 22 Jan 1997 10:52:46 +0100 (MET)
Received: from localhost (mingo@localhost) by pc5829.hil.siemens.at (8.6.12/8.6.9) with SMTP id KAA11881; Wed, 22 Jan 1997 10:52:31 +0100
Date: Wed, 22 Jan 1997 10:52:31 +0100 (MET)
From: Ingo Molnar <mingo@pc5829.hil.siemens.at>
To: netdev@roxanne.nuclecu.unam.mx, Eric Schenk <erics@dna.lth.se>
cc: Pedro Roque <roque@di.fc.ul.pt>,
        "David S. Miller" <davem@caip.rutgers.edu>
Subject: Re: socket lookups: a proposal 
In-Reply-To: <199701212336.AAA23162@regin>
Message-ID: <Pine.LNX.3.95.970122103441.11631B-100000@pc5829.hil.siemens.at>
Sender: owner-netdev@roxanne.nuclecu.unam.mx
Precedence: bulk
Reply-To: netdev@roxanne.nuclecu.unam.mx,
        Ingo Molnar <mingo@pc5829.hil.siemens.at>


On Wed, 22 Jan 1997, Eric Schenk wrote:

> >I don't really know a thing on how a processor cache operates (URLs are
> >welcomed) but common sense would say that you are right on that...
> >Unless you use a slab allocator to take your memory from a contiguos
> >memory pool...
> 
> Hmm. I'm not up on how strong the guarantees provided by the slab
> allocator can be. David, you want to comment on this?

my problem with the SLAB allocator is this radix tree case is that it
doesnt 'compress' cache elements. ie. we will have a typical distance
between nodes elements (even though we have fix slab pools of tightly
packed node structures), due to the statistic pressure of creation and
deletition of sockets.

say, any workload that has a 'changing number of sockets', where we can
have any number of sockets between say .. N and 2*N [this is arbitary,
just for an example], and say N is ... 10000.

Now, in the above workload. If we have a temporary peak of 20000 sockets,
the SLAB cache is tightly filled. Now when we drop back to 10000 sockets,
we still have almost the same number of slabs allocated. [because the
deallocation happens random, and a slab can only be deallocated when we
have the slab >completely< free. slabs are static in the sense that you
cannot move cache elements]

Not only the total memory usage, but the 'cache footprint' in any workload
is dominated too by the maximum peak socket usage.

[ in the above example, N sockets will most probably use 90% cache lines
  of the 2*N case, because elements do not fill a whole cache line and
  elements were deallocated randomly ]


Thus to overcome this problem i propose to always choose the order of the
radix tree in such a way, that a whole cache line is filled by exactly one
node. Say:

  struct radix {
     struct radix * leaf[N];
     struct socket * sock_addr;
  }

N choosen in such a way that:

   sizeof(sock_addr) + sizeof(struct radix *) * N = cache_line_size

This leads to a bit more memory usage in the 'many leaf' case, but IMHO
utilizes caches much better.

Another thing ... i do not like O(long N) for EVERY packet lookup, when we
can have O(1) :). There aint as much workloads in the world, lets identify
and use them ... maybe someone finds a cool hash function.

-- mingo