From owner-freebsd-arch@FreeBSD.ORG  Wed Aug 10 12:19:11 2005
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6023116A41F;
	Wed, 10 Aug 2005 12:19:11 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 101D743D48;
	Wed, 10 Aug 2005 12:19:11 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by cyrus.watson.org (Postfix) with ESMTP id AFD2D46B7D;
	Wed, 10 Aug 2005 08:19:10 -0400 (EDT)
Date: Wed, 10 Aug 2005 13:22:33 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Andre Oppermann <andre@freebsd.org>
In-Reply-To: <42F9ECF2.8080809@freebsd.org>
Message-ID: <20050810131805.C22763@fledge.watson.org>
References: <42F9ECF2.8080809@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org
Subject: Re: Special schedulers, one CPU only kernel, one only userland
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Aug 2005 12:19:11 -0000

On Wed, 10 Aug 2005, Andre Oppermann wrote:

> When using FreeBSD as a high performance router there are some desirable 
> changes to the way multiple CPUs are handled.  Normally a second CPU 
> doesn't add much (if any) performance to routing because of locking 
> overhead and packets randomly being processed by the CPUs wasting cache 
> efficiency. On the other hand having just one CPU is not optimal in 
> running the routing daemon in userland.  When there are large changes to 
> the table (eg. BGP full feed flap) userland sucks time away from the 
> packet forwarding in the kernel.
>
> The idea is to combine both worlds by designating CPU0 exclusively for 
> all kernel processing (thus avoiding the expensive mutex synchronization 
> and bus locking instructions) and CPU1 exclusively for all userland 
> processing.  Whenever a userland program does a syscall the kernel CPU 
> will take over.  When it's done, the process get run by the userland CPU 
> again.  That way we get a very good scalability out of two CPUs for this 
> particular task.
>
> Hence my question to the SMP and scheduler gurus: How well does the 
> current SMP and scheduler architecture lend itself to this kind of 
> special handling? Is it just a matter of modifying (or plugging in) the 
> schedule or are there more involved things to consider?

You can get a subset of this behavior by isolating processing of, say, 
routing events to a specific thread, and then pinning that thread to a 
specific CPU.  In fact, a lot of our network processing is done 
thread-locally, so as to avoid hitting synchronized data structures -- 
when procesisng IP packet headers, the mbuf is kept in thread-local 
storage for the duration, requiring no synchronization.  By extending this 
approach, you get the reduced synchronization benefits.  The other aspect 
is the precedence for CPU use and avoiding bad caching behavior.  Pinning 
gets you part of the way there.  Right now I believe we don't have a way 
to say "Don't use CPU X for userland processes".  However, the kernel 
should preempt userland processes when it needs to.  You may find you get 
90% of what you're looking for by pinning the netisr and any related 
ithreads (if not using polling) to a specific CPU, then having a 
thread-local routing cache or store of some sort.

Ideally the behavior you describe shouldn't require a specialized 
scheduler, rather, use of existing scheduler primitives.

Robert N M Watson