From owner-freebsd-arch@FreeBSD.ORG  Mon May 24 20:39:40 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id C77FB16A4CE; Mon, 24 May 2004 20:39:40 -0700 (PDT)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7A50643D39; Mon, 24 May 2004 20:39:40 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	i4P3dL7Z090506;	Mon, 24 May 2004 20:39:21 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id i4P3dLBX090505;
	Mon, 24 May 2004 20:39:21 -0700 (PDT)
	(envelope-from dillon)
Date: Mon, 24 May 2004 20:39:21 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200405250339.i4P3dLBX090505@apollo.backplane.com>
To: Robert Watson <rwatson@FreeBSD.org>
References: <Pine.NEB.3.96L.1040524151708.48993A-100000@fledge.watson.org>
cc: arch@FreeBSD.org
cc: Eivind Eklund <eivind@FreeBSD.org>
Subject: Re: Network Stack Locking
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 25 May 2004 03:39:41 -0000


:On Mon, 24 May 2004, Eivind Eklund wrote:
:
:> On Fri, May 21, 2004 at 01:23:51PM -0400, Robert Watson wrote:
:> > The other concern I have is whether the message queues get deep or not: 
:> > many of the benefits of message queues come when the queues allow
:> > coallescing of context switches to process multiple packets.  If you're
:> > paying a context switch per packet passing through the stack each time you
:> > cross a boundary, there's a non-trivial operational cost to that.
:> 
:> I don't know what Matt has done here, but at least with the design we
:> used for G2 (a private DFly-like project that John Dyson, I, and a few
:> other people I don't know if want to be anonymous or not ran), this
:> should not an issue.  We used thread context passing with an API that
:> contained putmsg_and_terminate() and message ports that automatically
:> could spawn new handler threads.  Effectively, a message-related context
:> switch turned into "assemble everything I care about in a small package,
:> reset the stack pointer, and go".  The expectation was that this should
:> end up with less overhead than function calls, as we could drop the call
:> frames for "higher levels in the chain".  We never got to the point
:> where we could measure if it worked out that way in practice, though. 
:
:Sounds a lot like a lot of the Mach IPC optimizations, including their use
:of continuations during IPC to avoid a full context switch.
:
:Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
:robert@fledge.watson.org      Senior Research Scientist, McAfee Research

    Well, I like the performance aspects of a continuation mechanism, but
    I really dislike the memory overhead.  Even a minimal stack is
    expensive when you multiply it by potentially hundreds of thousands
    of 'blocking' entities such as PCBs.. say, a TCP output stream.  
    Because of this the overhead and cache pollution generated by the
    continuation mechanism increases as system load increases rather
    then decreases.

    Deep message queues aren't necessarily a problem and, in fact, having
    one or two dozen messages backed up in a protocol thread's message
    port is actually good because the thread can then process all the
    messages in a tight loop (cpu and cache locality of reference).  If
    designed properly, this directly mitigates the cost of a thread switch
    as system load increases.  So message queueing has the opposite effect...
    per-unit handling overhead *decreases* as system load increases.
    (Also, DragonFly's thread scheduler is a much lighter weight mechanism
    then what you have in FBsd-4 or FBsd-5).

    e.g.:  lets say you have a context switch overhead of 1uS and a message
    processing overhead of 100ns.
	
	light load:	100 messages/sec:	1.1uS/message

	medium load:	1000 messages/sec, average 10 messages in queue at
			context switch:		10*100ns+1uS = 2uS/10 =
						200ns/msg

	heavy load:	10000 msgs/sec, average 100 msgs in queue:
						100*100ns+1uS = 11uS/100=
						110ns/msg

    The reason a deep message queue is not a problem vs other mechanisms
    is simple... a message represents a unit of work.  The work must be
    done regardless, and on the cpu it was told to be done on, no matter
    whether you use a message or a continuation or some other mechanism.
    In otherwords, a deep message queue is actually an effect of the
    problem, not a cause of that problem.  Solving the problem (if it
    actually is a problem) does not involve dealing with the deep message
    queue, it involves dealing with the set of circumstances that are
    causing that deep message queue to occur.

    Now, certainly end-to-end latency is an issue.  But when one is talking
    about context switching one is talking about nanoseconds and microseconds.
    Turn-around latency just isn't an issue most of the time, and in those
    extremely rare cases where it might be one does the turn-around in the 
    driver interrupt anyway.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>