From owner-freebsd-performance@FreeBSD.ORG Tue Oct 11 14:01:13 2005 Return-Path: X-Original-To: performance@FreeBSD.org Delivered-To: freebsd-performance@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 60AE116A41F; Tue, 11 Oct 2005 14:01:13 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id DDAFF43D58; Tue, 11 Oct 2005 14:01:12 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id E0CA346B7C; Tue, 11 Oct 2005 10:01:11 -0400 (EDT) Date: Tue, 11 Oct 2005 15:01:11 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: performance@FreeBSD.org In-Reply-To: <20051005133730.R87201@fledge.watson.org> Message-ID: <20051011145923.B92528@fledge.watson.org> References: <20051005133730.R87201@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: net@FreeBSD.org Subject: Re: Call for performance evaluation: net.isr.direct X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Oct 2005 14:01:13 -0000 On Wed, 5 Oct 2005, Robert Watson wrote: > In 2003, Jonathan Lemon added initial support for direct dispatch of > netisr handlers from the calling thread, as part of his DARPA/NAI Labs > contract in the DARPA CHATS research program. Over the last two years > since then, Sam Leffler and I have worked to refine this implementation, > removing a number of ordering related issues, opportunities for > excessive parallelism, recursion issues, and testing with a broad range > of network components. There has also been a significant effort to > complete MPSAFE locking work throughout the network stack. Combined > with the earlier move to ithreads and a functional direct dispatch > ("process to completion" implementation), there are a number of exciting > possible benefits. If I don't hear anything back in the near future, I will commit a change to 7.x to make direct dispatch the default, in order to let a broader community do the testing. :-) If you are setup to easily test stability and performance relating to direct dispatch, I would appreciate any help. As of 6.0-RC1 and recent 7.x, the name of the sysctl is "net.isr.direct"; previously it has been named "net.isr.enable", but its use is not recommend in versions that do not use the new name. Thanks, Robert N M Watson > > - Possible parallelism by packet source -- ithreads can dispatch > simultaenously into the higher level network stack layers. Since > ithreads can execute in parallel on different CPU, so can code they > invoke directly. > > - Elimination of context switches in the network receive path -- rather > than context switching to the netisr thread from the ithread, we can now > directly execute netisr code from the ithread. > > - A CPU-bound netisr thread on a multi-processor system will no longer > rate limit traffic to the available resources on one CPU. > > - Eliminating the additional queueing in the handoff reduces the > opportunity for queues to overfill as a result of scheduling delays. > > There are, however, some possible downsides and/or trade-offs: > > - Higher level network processing will now compete with the interrupt > handler for CPU resources available to the ithread. This means less > time for the interrupt code to execute in the thread if the thread is > CPU-bound. > > - Lower levels of parallelism between portions of the inbound packet > processing path. Without direct dispatch, there is possible parallelism > between receive network driver execution and higher level stack layers, > whereas with direct dispatch they can no longer execute in parallel. > > - Re-queued packets from tunnel and encapsulation processing will now > require a context switch to process, since they will be processed in the > netisr proper rather than in the ithread, whereas before the netisr > thread would pick them up immediately after completing the current > processing without a context switch. > > - Code that previously ran in the SWI at a SWI priority now runs in the > ithread at an ithread priority, elevating the general priority at which > network processing takes place. > > And there are a few mixed things, that can offer good and bad elements: > > - Less queueing takes place in the network stack in in-bound processing: > packets are taken directly from the driver and processed to completion > one by one, rather than queued for batch processing. Packets will be > dropped before the link layer, rather than on the boundary between the > link and protocol layers. This is good in that we invest less work in > packets we were going to drop anyway, but bad in that less queueing > means less room for scheduling delays. > > In previous FreeBSD releases, such as several 5.x series releases, > net.isr.enable could not be turned on by default because there was > insufficient synchronization in the network stack. As of 5.5 and 6.0, I > believe there is sufficient synchronization, especially given that we force > non-MPSAFE protocol handlers to run in the netisr without direct dispatch. > As such, there has been a gradual conversation going on about making direct > dispatch the default behavior in the 7.x development series, and more > publically documenting and supporting the use of direct dispatch in the 6.x > release engineering series. > > Obviously, this is about two things: performance, and stability. Many of us > have been running with direct dispatch on by default for quite some time, so > it passes some of the basic "does it run" tests. However, since it > significantly increases the opportunity for parallelism in the receive path > of the network stack, it likely will trigger otherwise latent or infrequent > races and bugs to occur more frequently. The second aspect is performance: > many results suggest that direct dispatch has a significant performance > benefit. However, evaluating the impact on a broad range of results is > required in order for us to go ahead with what is effectively a significant > architectural change in how we perform network stack processing. > > To give you a sense of some of the performance effect I've measured recently, > using the netperf measurement tool (with -DHISTOGRAM removed from the FreeBSD > port build), here are some results. In each case, I've put parenthesis > around host or router to indicate which is the host where the configuration > change is being tested. These tests were performed using dual Xeon systems, > and using back-to-back gigabit ethernet cards and the if_em driver: > > TCP round trip benchmark (TCP_RR), host-(host): > > 7.x UP: 0.9% performance improvement > 7.x SMP: 0.7% performance improvement > > TCP round trip benchmark (TCP_RR), host-(router)-host: > > 7.x UP: 2.4% performance improvement > 7.x SMP: 2.9% performance improvement > > UDP round trip benchmark (UDP_RR), host-(host): > > 7.x UP: 0.7% performance improvement > 7.x SMP: 0.6% performance improvement > > UDP round trip benchmark (UDP_RR), host-(router)-host: > > 7.x UP: 2.2% performance improvement > 7.x SMP: 3.0% performance improvement > > TCP stream banchmark (TCP_STREAM), host-(host): > > 7.x UP: 0.8% performance improvement > 7.x SMP: 1.8% performance improvement > > TCP stream benchmark (TCP_STREAM), host-(router)-host: > > 7.x UP: 13.6% performance improvement > 7.x SMP: 15.7% performance improvement > > UDP stream benchmark (UDP_STREAM), host-(host): > > 7.x UP: none > 7.x SMP: none > > UDP stream benchmark (UDP_STREAM), host-(router)-host: > > 7.x UP: none > 7.x SMP: none > > TCP connect benchmark (src/tools/tools/netrate/tcpconnect) > > 7.x UP: 7.90383% +/- 0.553773% > 7.x SMP: 12.2391% +/- 0.500561% > > So in some cases, the impact is negligible -- in other places, it is quite > significant. So far, I've not measured a case where performance has gotten > worse, but that's probably because I've only been measuring a limited number > of cases, and with a fairly limited scope of configurations, especially given > that the hardware I have is pushing the limits of what the wire supports, so > minor changes in latency are possible, but not large changes in throughput. > > So other than a summary of the status quo, this is also a call to action. I > would like to get more widespread benchmarking of the impact of direct > dispatch on network-related workloads. This means a variety of things: > > (1) Performance of low level network services, such as routing, bridging, > and filtering. > > (2) Performance of high level application servces, such as web and > database. > > (3) Performance of integrated kernel network services, such as the NFS > client and server. > > (4) Performance of user space distributed file systems, such as Samba and > AFS. > > All you need to do to switch to direct dispatch mode is set the sysctl or > tunable "net.isr.dispatch" to 1. To disable it again, remove the setting, or > set it to 0. It can be modified at run-time, although during the transition > from one mode to the other, there may be a small quantity of packet > misordering, so benchmarking over the transition is discouraged. > FYI: as of 6.0-RC1 and recent 7.0, net.isr.dispatch is the name of the > variable. In earlier releases, the name of this variable was net.isr.enable. > > Some important details: > > - Only non-local protocol traffic is affected: loopback traffic still goes > via the netisr to avoid issues of recursion and lock order. > > - In the general case, only in-bound traffic is directly affected by this > change. As such, send-only benchmarks may reveal little change. They > are still interesting, however. > > - However, the send path is indirectly affected due to changes in > scheduling, workload, interrupt handling, and so on. > > - Because network benchmarks, especially micro-benchmarks, are especially > sensitive to minor perturbations, I highly recommend running in a > minimal multi-user or ideally single-user environment, and suggest > isolating undesired sources of network traffic from segments where > testing is occuring. For macro-benchmarks this can be less important, > but should be paid attention to. > > - Please make sure debugging features are turned off when running tests -- > especially WITNESS, INVARIANTS, INVARIANT_SUPPORT, and user space malloc > debugging. These can have a significant impact on performance, both > potentially overshadowing changes, and in some cases, actually reversing > results (due to higher overhead under locks, for example). > > - Do not use net.isr.enable in the 5.x line unless you know what you are > doing. While it is reasonably safe with 5.4 forwards, it is not a > supported configuration, and may cause stability issues with specific > workloads. > > - What we're particularly interested in is a statistically meaningful > comparison of the "before" and "after" case. When doing measurements, I > like to run 10-12 samples, and usually discard the first one or two, > depending on the details of the benchmark. I'll then use > src/tools/tools/ministat to compare the data sets. Running a number of > samples is quite important, because the variance in many tests can be > significant, and if the two sample sets overlap, you can quite easily > draw the entirely wrong conclusion about the results from a small number > of measurements in a sample. > > Assuming you have a fixed width font, typicaly output from ministat looks > something like the following and may be human readable: > > x 7SMP/tcpconnect_queue > + 7SMP/tcpconnect_direct > +--------------------------------------------------------------------------+ > |x xx + +| > |xxxxx xx ++ +++++ +| > ||__A__| |___A__| | > +--------------------------------------------------------------------------+ > N Min Max Median Avg Stddev > x 10 5425 5503 5460 5456.3 26.284977 > + 10 6074 6169 6126 6124.1 31.606785 > Difference at 95.0% confidence > 667.8 +/- 27.3121 > 12.2391% +/- 0.500561% > (Student's t, pooled s = 29.0679) > > Of particular interest is if changing to direct dispatch hurts performance in > your environment, and understanding why that is. > > Thanks, > > Robert N M Watson > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" >