From owner-freebsd-net@FreeBSD.ORG  Thu Apr 27 14:56:55 2006
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
X-Original-To: freebsd-net@freebsd.org
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7187D16A403
	for <freebsd-net@freebsd.org>; Thu, 27 Apr 2006 14:56:55 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 499B043D48
	for <freebsd-net@freebsd.org>; Thu, 27 Apr 2006 14:56:53 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id A892146C57;
	Thu, 27 Apr 2006 10:56:51 -0400 (EDT)
Date: Thu, 27 Apr 2006 15:56:51 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Jeremie Le Hen <jeremie@le-hen.org>
In-Reply-To: <20060427143814.GD84148@obiwan.tataz.chchile.org>
Message-ID: <20060427154718.J75848@fledge.watson.org>
References: <7bb8f24157080b6aaacb897a99259df9@madhaus.cns.utoronto.ca>
	<20060427093916.GC84148@obiwan.tataz.chchile.org>
	<20060427145252.I75848@fledge.watson.org>
	<20060427143814.GD84148@obiwan.tataz.chchile.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Marcos Bedinelli <bedinelli@madhaus.cns.utoronto.ca>,
	freebsd-net@freebsd.org
Subject: Re: [fbsd] Re: [fbsd] Network performance in a dual CPU system
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Apr 2006 14:56:55 -0000


On Thu, 27 Apr 2006, Jeremie Le Hen wrote:

>> I missed the original thread, but in answer to the question: if you set 
>> net.isr.direct=1, then FreeBSD 6.x will run the netisr code in the ithread 
>> of the network device driver.  This will allow the IP forwarding and 
>> related paths in two threads instead of one, potentially allowing greater 
>> parallelism. Of course, you also potentially contend more locks, you may 
>> increase the time it takes for the ithread to respond to new interrupts, 
>> etc, so it's not quite cut and dry, but with a workload like the one shown 
>> above, it might make quite a difference.
>
> Actually you already replied in the original thread, explaining mostly
> the same thing.

:-)

> BTW, what I understand is that net.isr.direct=1 prevents from multiplexing 
> all packets on the netisr thread and instead makes the ithread do the job. 
> In this case, what happens to the netisr thread ? Does it still have some 
> work to do or is it removed ?

Yes -- basically, what this setting does is turn a deferred dispatch of the 
protocol level processing into a direct function invocation.  So instead of 
inserting the new IP packet into an IP processing queue from the ethernet code 
and waking up the netisr which calls the IP input routine, we directly call 
the IP input routine.  This has a number of potentially positive effects:

- Avoid the queue/dequeue operation
- Avoid a context switch
- Allow greater parallelism since protocol layer processing is not limited to
   the netisr thread

It also has some downsides:

- Perform more work in the ithread -- since any given thread is limited to a
   single CPU's worth of processing resources, if the link layer and protocol
   layer processing add up to more than one CPU, you slow them down
- Increase the time it takes to pull packets out of the card -- we process
   each packet to completion rather than pulling them out in sets and batching
   them.  This pushes drop on overload into the card instead of the IP queue,
   which has some benefits and some costs.

The netisr is still there, and will still be used for certain sorts of things. 
In particular, we use the netisr when doing arbitrary decapsulation, as this 
places an upper bound on thread stack use.  For example: if you have an IP in 
IP in IP in IP tunneled packet, if you always used direct dispatch, then you'd 
potentially get a deeply nested stack.  By looping it back into the queue and 
picking it up from the top level of the netisr dispatch, we avoid nesting the 
stacks, which could lead to stack overflow.  We don't context switch in that 
loop, so avoid context switch costs.  We also use the netisr for loopback 
network traffic.  So, in short, the netisr is still there, it just has reduced 
work scheduled in it.

Another potential model for increasing parallelism in the input path is to 
have multiple netisr threads -- this raises an interesting question relating 
to ordering.  right now, we use source ordering -- that is, we order packets 
in the network subsystem essentially in the order they come from a particular 
source.  So we guarantee that if four packets come in em0, they get processed 
in the order they are received from em0.  They may arbitrarily interlace with 
packets coming from other interfaces, such as em1, lo0, etc.  The reason for 
the strong source ordering is that some protocols, TCP in particular, respond 
really badly to misordering, which they detect as a loss and force retransmit 
for.  If we introduce multiple netisrs naively by simply having the different 
threads working from the same IP input queue, then we can potentially pull 
packets from the same source into different workers, and process them at 
different rates, resulting in misordering being introduced.  While we'd 
process packets with greater parallelism, and hence possibly faster, we'd 
toast the end-to-end protocol properties and make everyone really unhappy.

There are a few common ways people have addressed this -- it's actually very 
similar to the link parallelism problem.  For example, using bonded ethernet 
links, packets are assigned to a particular link based on a hash of their 
source address, so that individual streams from the same source remain in 
order with respect to themselves.  An obvious approach would be to assign 
particular ifnets to particular netisrs, since that would maintain our current 
source ordering assumptions, but allow the ithreads and netisrs to float to 
different CPUs.  A catch in this approach is load balancing: if two ifnets are 
assigned to the same netisr, then they can't run in parallel.  This line of 
thought can, and does, continue. :-)  The direct dispatch model maintains 
source ordering in a manner similar to having a per-source netisr, which works 
pretty well, and also avoids context switches.  The main downside is reducing 
parallelism between the ithread and the netisr, which for some configurations 
can be a big deal (i.e., if ithread uses 60% cpu, and netisr uses 60% cpu, 
you've limited them both to 50% cpu by combining them in a single thread).

Robert N M Watson