From owner-freebsd-net@FreeBSD.ORG Thu Apr 27 14:56:55 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7187D16A403 for ; Thu, 27 Apr 2006 14:56:55 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 499B043D48 for ; Thu, 27 Apr 2006 14:56:53 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id A892146C57; Thu, 27 Apr 2006 10:56:51 -0400 (EDT) Date: Thu, 27 Apr 2006 15:56:51 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Jeremie Le Hen In-Reply-To: <20060427143814.GD84148@obiwan.tataz.chchile.org> Message-ID: <20060427154718.J75848@fledge.watson.org> References: <7bb8f24157080b6aaacb897a99259df9@madhaus.cns.utoronto.ca> <20060427093916.GC84148@obiwan.tataz.chchile.org> <20060427145252.I75848@fledge.watson.org> <20060427143814.GD84148@obiwan.tataz.chchile.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Marcos Bedinelli , freebsd-net@freebsd.org Subject: Re: [fbsd] Re: [fbsd] Network performance in a dual CPU system X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Apr 2006 14:56:55 -0000 On Thu, 27 Apr 2006, Jeremie Le Hen wrote: >> I missed the original thread, but in answer to the question: if you set >> net.isr.direct=1, then FreeBSD 6.x will run the netisr code in the ithread >> of the network device driver. This will allow the IP forwarding and >> related paths in two threads instead of one, potentially allowing greater >> parallelism. Of course, you also potentially contend more locks, you may >> increase the time it takes for the ithread to respond to new interrupts, >> etc, so it's not quite cut and dry, but with a workload like the one shown >> above, it might make quite a difference. > > Actually you already replied in the original thread, explaining mostly > the same thing. :-) > BTW, what I understand is that net.isr.direct=1 prevents from multiplexing > all packets on the netisr thread and instead makes the ithread do the job. > In this case, what happens to the netisr thread ? Does it still have some > work to do or is it removed ? Yes -- basically, what this setting does is turn a deferred dispatch of the protocol level processing into a direct function invocation. So instead of inserting the new IP packet into an IP processing queue from the ethernet code and waking up the netisr which calls the IP input routine, we directly call the IP input routine. This has a number of potentially positive effects: - Avoid the queue/dequeue operation - Avoid a context switch - Allow greater parallelism since protocol layer processing is not limited to the netisr thread It also has some downsides: - Perform more work in the ithread -- since any given thread is limited to a single CPU's worth of processing resources, if the link layer and protocol layer processing add up to more than one CPU, you slow them down - Increase the time it takes to pull packets out of the card -- we process each packet to completion rather than pulling them out in sets and batching them. This pushes drop on overload into the card instead of the IP queue, which has some benefits and some costs. The netisr is still there, and will still be used for certain sorts of things. In particular, we use the netisr when doing arbitrary decapsulation, as this places an upper bound on thread stack use. For example: if you have an IP in IP in IP in IP tunneled packet, if you always used direct dispatch, then you'd potentially get a deeply nested stack. By looping it back into the queue and picking it up from the top level of the netisr dispatch, we avoid nesting the stacks, which could lead to stack overflow. We don't context switch in that loop, so avoid context switch costs. We also use the netisr for loopback network traffic. So, in short, the netisr is still there, it just has reduced work scheduled in it. Another potential model for increasing parallelism in the input path is to have multiple netisr threads -- this raises an interesting question relating to ordering. right now, we use source ordering -- that is, we order packets in the network subsystem essentially in the order they come from a particular source. So we guarantee that if four packets come in em0, they get processed in the order they are received from em0. They may arbitrarily interlace with packets coming from other interfaces, such as em1, lo0, etc. The reason for the strong source ordering is that some protocols, TCP in particular, respond really badly to misordering, which they detect as a loss and force retransmit for. If we introduce multiple netisrs naively by simply having the different threads working from the same IP input queue, then we can potentially pull packets from the same source into different workers, and process them at different rates, resulting in misordering being introduced. While we'd process packets with greater parallelism, and hence possibly faster, we'd toast the end-to-end protocol properties and make everyone really unhappy. There are a few common ways people have addressed this -- it's actually very similar to the link parallelism problem. For example, using bonded ethernet links, packets are assigned to a particular link based on a hash of their source address, so that individual streams from the same source remain in order with respect to themselves. An obvious approach would be to assign particular ifnets to particular netisrs, since that would maintain our current source ordering assumptions, but allow the ithreads and netisrs to float to different CPUs. A catch in this approach is load balancing: if two ifnets are assigned to the same netisr, then they can't run in parallel. This line of thought can, and does, continue. :-) The direct dispatch model maintains source ordering in a manner similar to having a per-source netisr, which works pretty well, and also avoids context switches. The main downside is reducing parallelism between the ithread and the netisr, which for some configurations can be a big deal (i.e., if ithread uses 60% cpu, and netisr uses 60% cpu, you've limited them both to 50% cpu by combining them in a single thread). Robert N M Watson