From owner-freebsd-arch@FreeBSD.ORG Fri Dec 1 01:39:31 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6CB6116A415 for ; Fri, 1 Dec 2006 01:39:31 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id E48BF43CC0 for ; Fri, 1 Dec 2006 01:39:14 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 6683346D2B; Thu, 30 Nov 2006 20:39:24 -0500 (EST) Date: Fri, 1 Dec 2006 01:39:24 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ivan Voras In-Reply-To: Message-ID: <20061201012221.J79653@fledge.watson.org> References: <200611292147.kATLll4m048223@apollo.backplane.com> <11606.1164837711@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-arch@freebsd.org Subject: Re: a proposed callout API X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Dec 2006 01:39:31 -0000 On Thu, 30 Nov 2006, Ivan Voras wrote: > No trying to take sides here, but for us willing to learn here, what exactly > are the problems in Matt Dillon's suggestions? From a novice's POV, having > per-cpu queues looks (emphasis: looks) very scalable and performant. The implications of adopting the model Matt proposes are quite far-reaching: callouts don't exist in isolation, but occur in the context of data structures and work occuring in many threads. If callouts are pinned to a particular CPU, and can only be scheduled, rescheduled, and cancelled from that CPU, that implies either that all work associated with that callout is also pinned to the CPU, or that migration or message-passing be involved if the requirement comes up in a thread on another CPU. Consider the case of TCP timers: a number of TCP timers get regularly rescheduled (delack, retransmit, etc). If they can only be manipulated from cpu0 (i.e., protected by a synchronization primitive that can't be acquired from another CPU -- i.e., critical sections instead of mutexes), how do you handle the case where the a TCP packet for that connection is processed on cpu1 and needs to change the scheduling of the timer? In a strict work/data structure pinning model, you would pin the TCP connection to cpu0, and only process any data leading to timer changes on that CPU. Alternatively, you might pass a message from cpu1 to cpu0 to change the scheduling. The idea of processing timers in multiple threads and pinning them to multiple CPUs clearly isn't a bad idea: we could likely benefit from parallelism (and generally, concurrency) in timer processing. One of the things we discussed at the recent developer summit was subsystem callout threads (introducing the opportunity for parallism without committing to a particular CPU scheduling model), as well as per-CPU callout threads but protected using mutexes so that reschedule/cancel/etc can be performed form other CPUs still. Changing the API so that scheduling/rescheduling/etc activities themselves must occur on a particular CPU has serious implications and commits us to an architectural approach for which there is little concensus. If the goal is simply parallelism, it's possible to accomplish that without embedding assumptions about the synchronization model at this point. Take a look at the USENIX paper by Paul Willmann (et al) at Rice for some rather interesting experimentation, measurement, and discussion precisely along these lines: http://www.ece.rice.edu/~willmann/pubs/paranet_tr06-872.pdf Robert N M Watson Computer Laboratory University of Cambridge