From owner-freebsd-arch@FreeBSD.ORG Mon May 24 20:39:40 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C77FB16A4CE; Mon, 24 May 2004 20:39:40 -0700 (PDT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7A50643D39; Mon, 24 May 2004 20:39:40 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) i4P3dL7Z090506; Mon, 24 May 2004 20:39:21 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id i4P3dLBX090505; Mon, 24 May 2004 20:39:21 -0700 (PDT) (envelope-from dillon) Date: Mon, 24 May 2004 20:39:21 -0700 (PDT) From: Matthew Dillon Message-Id: <200405250339.i4P3dLBX090505@apollo.backplane.com> To: Robert Watson References: cc: arch@FreeBSD.org cc: Eivind Eklund Subject: Re: Network Stack Locking X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 May 2004 03:39:41 -0000 :On Mon, 24 May 2004, Eivind Eklund wrote: : :> On Fri, May 21, 2004 at 01:23:51PM -0400, Robert Watson wrote: :> > The other concern I have is whether the message queues get deep or not: :> > many of the benefits of message queues come when the queues allow :> > coallescing of context switches to process multiple packets. If you're :> > paying a context switch per packet passing through the stack each time you :> > cross a boundary, there's a non-trivial operational cost to that. :> :> I don't know what Matt has done here, but at least with the design we :> used for G2 (a private DFly-like project that John Dyson, I, and a few :> other people I don't know if want to be anonymous or not ran), this :> should not an issue. We used thread context passing with an API that :> contained putmsg_and_terminate() and message ports that automatically :> could spawn new handler threads. Effectively, a message-related context :> switch turned into "assemble everything I care about in a small package, :> reset the stack pointer, and go". The expectation was that this should :> end up with less overhead than function calls, as we could drop the call :> frames for "higher levels in the chain". We never got to the point :> where we could measure if it worked out that way in practice, though. : :Sounds a lot like a lot of the Mach IPC optimizations, including their use :of continuations during IPC to avoid a full context switch. : :Robert N M Watson FreeBSD Core Team, TrustedBSD Projects :robert@fledge.watson.org Senior Research Scientist, McAfee Research Well, I like the performance aspects of a continuation mechanism, but I really dislike the memory overhead. Even a minimal stack is expensive when you multiply it by potentially hundreds of thousands of 'blocking' entities such as PCBs.. say, a TCP output stream. Because of this the overhead and cache pollution generated by the continuation mechanism increases as system load increases rather then decreases. Deep message queues aren't necessarily a problem and, in fact, having one or two dozen messages backed up in a protocol thread's message port is actually good because the thread can then process all the messages in a tight loop (cpu and cache locality of reference). If designed properly, this directly mitigates the cost of a thread switch as system load increases. So message queueing has the opposite effect... per-unit handling overhead *decreases* as system load increases. (Also, DragonFly's thread scheduler is a much lighter weight mechanism then what you have in FBsd-4 or FBsd-5). e.g.: lets say you have a context switch overhead of 1uS and a message processing overhead of 100ns. light load: 100 messages/sec: 1.1uS/message medium load: 1000 messages/sec, average 10 messages in queue at context switch: 10*100ns+1uS = 2uS/10 = 200ns/msg heavy load: 10000 msgs/sec, average 100 msgs in queue: 100*100ns+1uS = 11uS/100= 110ns/msg The reason a deep message queue is not a problem vs other mechanisms is simple... a message represents a unit of work. The work must be done regardless, and on the cpu it was told to be done on, no matter whether you use a message or a continuation or some other mechanism. In otherwords, a deep message queue is actually an effect of the problem, not a cause of that problem. Solving the problem (if it actually is a problem) does not involve dealing with the deep message queue, it involves dealing with the set of circumstances that are causing that deep message queue to occur. Now, certainly end-to-end latency is an issue. But when one is talking about context switching one is talking about nanoseconds and microseconds. Turn-around latency just isn't an issue most of the time, and in those extremely rare cases where it might be one does the turn-around in the driver interrupt anyway. -Matt Matthew Dillon