From owner-freebsd-hackers  Wed Mar 12 20:50:52 1997
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id UAA04103
          for hackers-outgoing; Wed, 12 Mar 1997 20:50:52 -0800 (PST)
Received: from caipfs.rutgers.edu (root@caipfs.rutgers.edu [128.6.37.100])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id UAA04076
          for <hackers@FreeBSD.ORG>; Wed, 12 Mar 1997 20:50:22 -0800 (PST)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id XAA08031;
	Wed, 12 Mar 1997 23:50:11 -0500 (EST)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id XAA21796; Wed, 12 Mar 1997 23:49:59 -0500
Date: Wed, 12 Mar 1997 23:49:59 -0500
Message-Id: <199703130449.XAA21796@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: ccsanady@nyx.pr.mcs.net
CC: hackers@FreeBSD.ORG
In-reply-to: <199703130435.WAA11627@nyx.pr.mcs.net> (message from Chris
	Csanady on Wed, 12 Mar 1997 22:35:45 -0600)
Subject: Re: Solaris TPC-C benchmarks (with Oracle)
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

   Date: Wed, 12 Mar 1997 22:35:45 -0600
   From: Chris Csanady <ccsanady@nyx.pr.mcs.net>

   Ok, this is pretty much as I thought.  But is it worth it to do
   more complicated memory management, or just eat the wasted space of
   fixed size buffers?  I mean, it won't waste any more space than
   mbuf clusters do for ethernet.  If your using ATM, or HIPPI, you
   can afford the extra memory. :)

Yes it does pay off, this allows the driver (the person who "knows"
best how to manage the packets it services) to perform the mangement
in the most efficient way possible.

Nicely, if you notice (which you undoubtedly will) that many drivers
manage things in an extremely similar fashion, you can write "pbuf
generic" methods which the drivers share when they don't do anything
overly special with buffer management.

   I was curious about this--in his slides, he mentions that sockbuf's
   go away.  :\ Can you elaborate more on whats going on?

Watch the output control path, it just jams it out to the driver, the
code in his slides don't check for things like congestion avoidance
(checking cong_window/ssthresh) You get the idea.  It's not this huge
problem, you just have to remember to keep the normal tcp outgoing
flow control code in the path and eliminate it entirely as Jacobson
seems to have done.

   >Secondly, his fast paths for input bank on the fact that you can
   get >right into user context when you detect a header prediction
   hit.  The >only way to do this effectively on a system you'd ever
   want anyone to >actually run is the following:

   I think that the header prediction code is called from a user
   context, so you would already be there.

Ok.  You still have the issue of pbuf loaning, you have to get to user
context somehow and that can take time.  The real time priority trick
is necessary to make that as small as humanly possible, decreasing the
chance of the driver running out of receive buffers and dropping
packets under high (or normal) load.

   pbufs would essentially be the same as mbufs to the drivers i would
   think--except less complicated.  Right now, I think that the
   drivers just dma into an mbuf cluster.  I don't see why it can't
   loan them out for a while.

This would be a really nice scenerio, and I once thought the world
were as simple as this.  Many people set me straight ;-) You have to
still allow all the stupid devices to work still, for example I know
many ethernet controller supported by both Linux and FreeBSD have one
packet fifo's and other strange things.  You have to provide a way to
tell the incoming packet code "this is a stupid device, it has a small
amount if any buffering, thus copy it now and do not optimize".

   >This is all nontrivial to pull of.  One nice effect is that you
   >actually then have a chance of doing real networking page flipping
   >with the device buffer method scheme.

   Does Van Jacobsons kernel to page flipping?

No, but my comments were there to point out that page flipping is
easier to contruct into your grand scheme of things with the device
methods there.  It is simply more flexible.  (hint hint, think about
fast routing ala CISCO's with shared memory, direct dma from card to
card to forward the packet and other tricks, pbufs can allow you to do
it)

   I thought he just did a checksum and copy to a user buffer.  I
   remember John saying something about it being more expensive to do
   this than a copy, although it was in a different context. (with
   regard to the pipe code i think)..  I dont know.  If it would work,
   it would sure be nice, but my simple pbuf allocator would
   definately not work..

Every example I have ever measured either myself or has been done by
someone else indicates that on just about any processor you get the
checksum for free when copy/checksum is done as a combined operation.
Now I would not be surprised if the overhead of checking mbuf chainage
would adversely effect the efficience of this technique, but pbufs
would make that issue disappear. ;-)

   I should have said viability rather than scalability. :)

Yes that does make sense then ;-)

   I'd like to finish volume 2 before I even think about the timers or
   such..

;-)

---------------------------------------------//// Yow! 11.26 MB/s
remote host TCP bandwidth & //// 199 usec remote TCP latency over
100Mb/s //// ethernet.  Beat that!  ////
-----------------------------------------////__________ o David
S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><