From owner-freebsd-net@FreeBSD.ORG Thu Jun 14 00:03:34 2007 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7E28916A474 for ; Thu, 14 Jun 2007 00:03:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx02.syd.optusnet.com.au (fallbackmx02.syd.optusnet.com.au [211.29.133.72]) by mx1.freebsd.org (Postfix) with ESMTP id E1AE913C455 for ; Thu, 14 Jun 2007 00:02:24 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by fallbackmx02.syd.optusnet.com.au (8.12.11.20060308/8.12.11) with ESMTP id l5D51pR6000844 for ; Wed, 13 Jun 2007 15:01:51 +1000 Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l5D51iKv016709 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 13 Jun 2007 15:01:49 +1000 Date: Wed, 13 Jun 2007 15:01:46 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephan Koenig In-Reply-To: Message-ID: <20070613134445.H19019@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: Tuning if_bge for high packet rates (receive descriptors/transmit descriptors?) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jun 2007 00:03:34 -0000 On Tue, 12 Jun 2007, Stephan Koenig wrote: > We have some servers that have a very high packet rate in a normal > production mode, and require polling to keep the CPU load reasonable. > > We have Kern.HZ=4000, and even still, have some dropped packets. > > On our servers that use the intel "em" driver, we have tuned the > drivers as following, by default, if_em.h has: > > #define EM_MIN_TXD 80 > #define EM_MAX_TXD_82543 256 > #define EM_MAX_TXD 4096 > #define EM_DEFAULT_TXD EM_MAX_TXD_82543 > > #define EM_MIN_RXD 80 > #define EM_MAX_RXD_82543 256 > #define EM_MAX_RXD 4096 > #define EM_DEFAULT_RXD EM_MAX_RXD_82543 > > > We have changed EM_DEFAULT_TXD and EM_DEFAULT_RXD to 4096 -- This > solved the problem on these servers. > > The question is now what to do on our servers with Broadcom "bge" > series cards. Does anyone know how to tune this driver in a similar > matter? 1) Change BGE_SSLOTS from 256 to 512. This corresponds to changing EM_DEFAULT_RXD from 256 to 4096, except the max is much smaller. 2) Don't use polling. Polling "works" to reduce CPU by dropping packets on input and be reducing throughput on output. It works particularly badly for bge. 3) When not using polling, change the interrupt coalescing parameters. These can be set to give similar behaviour to polling, without most of pollings losses or features. E.g., setting interrupt moderation timeouts to 250 uS gives behaviour similar to polling at 4000 Hz. For input, the main difference is that the interrupts are high priority so dropping packets is less likely. For output, the throttling behaviour of polling is more useful and without it bge interfaces may use more CPU so as to actually send packets as fast as possible. I use the following simple coalescing tuning in RELENG_6. This essentially restores the old tuning. The tuning is now essentially Linux''s and is not good for FreeBSD. % Index: if_bge.c % =================================================================== % RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v % retrieving revision 1.91.2.23 % diff -u -2 -r1.91.2.23 if_bge.c % --- if_bge.c 8 May 2007 16:18:21 -0000 1.91.2.23 % +++ if_bge.c 9 May 2007 10:09:55 -0000 % @@ -2391,7 +2391,7 @@ % sc->bge_stat_ticks = BGE_TICKS_PER_SEC; % sc->bge_rx_coal_ticks = 150; % - sc->bge_tx_coal_ticks = 150; % - sc->bge_rx_max_coal_bds = 10; % - sc->bge_tx_max_coal_bds = 10; % + sc->bge_tx_coal_ticks = 1000000; % + sc->bge_rx_max_coal_bds = 64; % + sc->bge_tx_max_coal_bds = 384; % % /* Set up ifnet structure */ The parameters here are: sc->bge_rx_coal_ticks: Set this to 100000/N to give essentially the same behaviour as polling at N Hz. The default of 150 gives polling at 6667 Hz. This is a good default. sc->bge_tx_coal_ticks: Like the rx coal_ticks, but not really needed since tx can be controlled better by coal_bds. I set it to 1000000 (1 second) since it is only used to free inactive tx descriptors if the device becomes idle. sc->bge_rx_max_coal_bds: Set this to 1 to give minimum latency and maximum CPU use. Set it to 0 (infinity) or large to give bad latency but less CPU use. The default of 64 gave a good traeoff. The current and RELENG_6 value is too small. For 1500-byte packets, the regression in this parameter little effect when the rx ticks timeout is 150 uS, since the timeout fires first for either value, but for tiny packets a value of 10 for this parameter asks for 150k interrupts per second which is too many. sc->bge_tx_max_coal_bds: Set this to almost the maximum possible to give minimum CPU use at almost no cost to latency. The maximum possible is 511 (496?), but that is too agressive. I use 384. The old value of 128 works OK too (costs ~10% more CPU than 512). The current and RELENG_6 value of 10 is not good. It costs about 100% more CPU than a value of 384 and has no observable good effects. (Above 384, there are some observable bad effects due to bus contention with rx. I only tested on PCI 33MHz buses. The tradeoffs might be a little different on faster buses.) I use dynamic tuning of bge rx coalescing (by rate-limiting interrupts) in -current. em does this in hardware. Bruce