From owner-freebsd-smp  Thu Dec  5 19:03:42 1996
Return-Path: <owner-smp>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.4/8.8.4) id TAA21925
          for smp-outgoing; Thu, 5 Dec 1996 19:03:42 -0800 (PST)
Received: from spinner.DIALix.COM (root@spinner.DIALix.COM [192.203.228.67])
          by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id TAA21860
          for <freebsd-smp@freebsd.org>; Thu, 5 Dec 1996 19:03:31 -0800 (PST)
Received: from spinner.DIALix.COM (peter@localhost.DIALix.oz.au [127.0.0.1])
          by spinner.DIALix.COM (8.8.4/8.8.4) with ESMTP id LAA12276;
          Fri, 6 Dec 1996 11:02:55 +0800 (WST)
Message-Id: <199612060302.LAA12276@spinner.DIALix.COM>
To: Chris Csanady <ccsanady@friley216.res.iastate.edu>
cc: freebsd-smp@freebsd.org
Subject: Re: make locking more generic? 
In-reply-to: Your message of "Thu, 05 Dec 1996 20:30:20 CST."
             <199612060230.UAA00386@friley216.res.iastate.edu> 
Date: Fri, 06 Dec 1996 11:02:54 +0800
From: Peter Wemm <peter@spinner.dialix.com>
Sender: owner-smp@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Chris Csanady wrote:
> >Kevin Van Maren wrote:
> >> Yes, the reason you need finer grained locking is because the
> >> interrupts *should* go to the other processor.  If one
> >> processor is handling an interrupt and annother int comes
> >> in, the other CPU should be able to handle it.  This 
> >> would finally give parallel I/O!  Linux doesn't do this,
> >> and they do very poorly when not every process is CPU bound.
> >> 
> >> Kevin
> >> 
> >> ps: This will most likely mean fixing device drivers as well.
> >
> >Yes, it will most likely one of two options for each driver..  We will
> >have to modify it to do fine grain locking (this is a major problem for
> >the network cards due to the mbuf design), or have some way of running
> 
> s/design/stupidity/
> 
> Chris

Well, there are several things to consider before writing it off as
"stupid", it's bloody quick compared to what something like STREAMS
would be like, which is much more MP-safe.

With the present "design", an mbuf is allocated in the network soft
interrupt layer, and that mbuf is passed all the way up to the socket
buffers, right through the ip and tcp layers without queueing it anywhere.
The netisr handler literally "runs" the upwards direction tcp/ip "engine".
It's pretty much the same in the other direction, but from memory one
direction has one queueing stage, the other doesn't.

STREAMS, on the other hand, has "modules" with an incoming and outgoing
queue.  The network driver allocates an mblk/dblk and queues it in it's
upstream neighbor's queue.  The upstream module's service routine is run
which dequeues it, processes it, and enqueues it on the upstream side...
And so on right up to the stream head.  On the plus side, it's got a lot
of potential for putting hooks in for fine-grain locking on each queue and
data block, but the overheads are incredible.

I'm not familiar with Linux's design, I think that it's a simplified
(read: the crap has been stripped out and optimised) version of the
mbuf cluster model, where large buffers are used and passed around.
I do not know if they queue on entry to each logical "component" of
the protocol stack, but I suspect not, since they are after speed.

Calling it a simplified version of mbuf clusters is probably not going
to be popular, but that's what a casual glance suggested to me.  We have
two major types of mbufs.. "standard" small ones 128 bytes long, with about
106 bytes of data space or so, and "clusters" where the mbuf points to a
2K or 4K page of data.  I think the Linux model is like the latter where
there is either a seperate header and data chunk, or the header is at the
start of the data "page".  If this is the case, their system probably won't
lend itself to multi-threading any better than ours.

I suspect that we're going to be stuck with a giant "networking lock"
to cover everything from the soft network interrupts through the mbuf
code, through to the protocol engine and high level socket buffers.

There may be room to have a decoupling layer in between the network cards
and the network "protocol engine" as such, and the same at the top end.
This may allow us to get away with running the soft net processing for the
cards in parallel with the network "engine" as such.  This will require
a locked queueing stage to get data from one to the other.

Cheers,
-Peter