Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 17 Nov 2007 21:13:50 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Igor Sysoev <is@rambler-co.ru>
Cc:        freebsd-net@FreeBSD.org
Subject:   Re: bge loader tunables
Message-ID:  <20071117194615.L67319@delplex.bde.org>
In-Reply-To: <20071117071053.GA18091@rambler-co.ru>
References:  <20071116154019.GE93422@rambler-co.ru> <20071117065908.T65479@delplex.bde.org> <20071117071053.GA18091@rambler-co.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 17 Nov 2007, Igor Sysoev wrote:

> On Sat, Nov 17, 2007 at 08:30:58AM +1100, Bruce Evans wrote:
>
>> On Fri, 16 Nov 2007, Igor Sysoev wrote:
>>
>>> The attached patch creates the following bge loader tunables:
>>
>> I plan to commit old work to do this using sysctls.  Tunables are
>> harder to use and aren't needed since changes to the defaults aren't
>> needed for booting.  I also implemented dynamic tuning for rx coal
>> parameters so that the sysctls are mostly not needed.  Ask for patches
>> if you want to test this extensively.
>
> Yes, I can test your patches on 6.2 and 7.0.
> Now bge set the coalescing parameters at attach time.
> Do the sysctl's allow to change them on-the-fly ?
> How does rx dynamic tuning work ?
> Could it be turned off ?

OK, the patch is enclosed at the end, in 2 versions:
- all my patches for bge (with lots of debugging cruft and half-baked
   fixes for 5705+ sysctls.
- edited version with only the coalescing parameter changes.

I haven't used it under 6.2, but have used a similar version in ~5.2,
and it should work in 6.2 except for the 5705+ sysctl fixes.

bge actually sets parameters at init time, and it initializes whenever the
link is brought back up, so the parameters can be changed using
"ifconfig bgeN down up".  Several network drivers have interrupt moderation
parameters that can be changed in this way, but it is painful to change
the link status like that, so I have a sysctl dev.bge.N.program_coal to
apply the current parameters to the hardware.  The other sysctls to change
the parameters don't apply immediately, except the one for the rx tuning
max interrupt rate, since applying the changed parameters to the hardware
takes more code than a SYSCTL_INT(), and it is sometimes necessary to
change all the parameters together atomically.

Dynamic tuning works by monitoring the current rx packet rate and
increasing the active rx_max_coal_bds so that the ratio <rx packet
rate> / rx_max_coal_bds is usually <= the specified max rx interrupt
rate.  rx_coal_ticks is set to the constant value of the inverse of
the specified max rx interrupt rate (in ticks) on transition to dynamic
mode but IIRC is not changed when the dynamic rate is changed (not
always changing it automatically allows adjusting it independently of
the rate but is often not what is wanted).  The transition has some
bias towards lower latency over too many interrupts, so that short
bursts don't increase the latency.  I think this simple algorithm is
good enough provided the load (in rx packets/second) doesn't oscillate
rapidly.

Dynamic tuning requires efficient reprogramming of at least one of the
hardware coal registers so that the tuning can respond rapidly to changes.
I have 2 methods for this:
- bge_careful_coal = 1 avoids using uses a potentially very long
   busy-wait loop in the interrupt handler by giving up on reprogramming
   the host coalescing engine (HCE) if the HCE seems to be busy.  Docs
   seem to require waiting for up to several milliseconds for the HCE
   to stablilize, and it is not clear if it is possible for the HCE to
   never stabilize because packets are streaming in.  (I don't have
   proper docs.)  This seems to always work (the HCE is never busy)
   for rx_max_coal_bds, but something near here didn't work for
   changing rx_coal_ticks in an old version.
- bge_careful_coal = 0 avoids the loop by writing to the rx_max_coal_bds
   register without waiting for the HCE.  This seems to work too.  It
   isn't critical for the HCE to see the change immediately or even
   for it to be seen at all (missed changes might do more than give a
   huge interrupt rate for too long), but it is important for the
   change to not break the engine.
There is no sysctl for this of for some other hackish parameters.  The
source must be edited to change this from 1 to 0.

Dynamic tuning is turned off by setting the dynamic max interrupt
frequency to 0.  Then rx_coal_ticks is reset to 150, and the active
rx_max_coal_bds is restored to the static value.

>>> hw.bge.tx_coal_desc=128
>>>
>>> This value delays the generation of transmit interrupts until specified
>>> number of packets will be transmited. The default value is 10.
>>
>> 128 is a good default.  I use 384.  There are few latency issues here, so
>> the default of 10 mainly costs efficiency.
>
> Does 384 not delay tx if there is shortage of free tx descriptors ?

No, it just increases the risk of the tx running dry by possibly not
interrupting until there are only a few tx descriptors remaining in
the hardware tx queue.  Under load, the interrupt handler and/or
bge_start() normally refills the queue to length 496 (512 less 16 for
safety), and an interrupt arrives 384 descriptors later when the queue
length has been reduced to 112.  (My debugging sysctls show this
behaviour clearly.)  Then the interrupt must be handled (at least
partially) within 112 descriptor times to avoid the tx running dry.
This handling is usually possible.  Even 480 works OK, but the throughput
drops noticeably near that value.  Under lighter loads, the queue is
not completely refilled, but there is little chance of the tx running
dry since OACTIVE is not set (the queue could only run dry despite there
being data to be sent if unrelated system load prevents threads from
running enough to top up the queue).

Complete patch:
---
% Index: if_bge.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v
% retrieving revision 1.198
% diff -u -2 -r1.198 if_bge.c
% --- if_bge.c	30 Sep 2007 11:05:14 -0000	1.198
% +++ if_bge.c	8 Nov 2007 16:01:49 -0000
% @@ -1,2 +1,10 @@
% +int bge_careful_coal = 1;
% +int bge_qlen = 1;
% +int bge_errsrc = 0x17;
% +int bge_rx_repl = 64;
% +int bge_coal_writes;
% +int bge_coal_write_fails;
% +int bge_polling_trust_statusword = 0;
% +
%  /*-
%   * Copyright (c) 2001 Wind River Systems
% @@ -386,4 +394,5 @@
%   * traps on certain architectures.
%   */
% +#define BGE_REGISTER_DEBUG
%  #ifdef BGE_REGISTER_DEBUG
%  static int bge_sysctl_debug_info(SYSCTL_HANDLER_ARGS);
% @@ -427,4 +436,5 @@
% 
%  static int bge_allow_asf = 1;
% +static int bge_return_ring_cnt = BGE_RETURN_RING_CNT;	/* XXX global. */
% 
%  TUNABLE_INT("hw.bge.allow_asf", &bge_allow_asf);
% @@ -867,10 +877,4 @@
%  }
% 
% -/*
% - * The standard receive ring has 512 entries in it. At 2K per mbuf cluster,
% - * that's 1MB or memory, which is a lot. For now, we fill only the first
% - * 256 ring entries and hope that our CPU is fast enough to keep up with
% - * the NIC.
% - */
%  static int
%  bge_init_rx_ring_std(struct bge_softc *sc)
% @@ -878,8 +882,8 @@
%  	int i;
% 
% -	for (i = 0; i < BGE_SSLOTS; i++) {
% +	for (i = 0; i < BGE_STD_RX_RING_CNT; i++) {
%  		if (bge_newbuf_std(sc, i, NULL) == ENOBUFS)
%  			return (ENOBUFS);
% -	};
% +	}
% 
%  	bus_dmamap_sync(sc->bge_cdata.bge_rx_std_ring_tag,
% @@ -922,5 +926,5 @@
%  		if (bge_newbuf_jumbo(sc, i, NULL) == ENOBUFS)
%  			return (ENOBUFS);
% -	};
% +	}
% 
%  	bus_dmamap_sync(sc->bge_cdata.bge_rx_jumbo_ring_tag,
% @@ -1426,5 +1430,5 @@
%  		val = 8;
%  	else
% -		val = BGE_STD_RX_RING_CNT / 8;
% +		val = BGE_STD_RX_RING_CNT / 8, bge_rx_repl;
%  	CSR_WRITE_4(sc, BGE_RBDI_STD_REPL_THRESH, val);
%  	CSR_WRITE_4(sc, BGE_RBDI_JUMBO_REPL_THRESH, BGE_JUMBO_RX_RING_CNT/8);
% @@ -1530,4 +1534,11 @@
% 
%  	/* Set up host coalescing defaults */
% +	if (sc->bge_dyncoal_max_intr_freq != 0) {
% +		sc->bge_dyncoal_scale = ((uint64_t)1 << 24) /
% +		    sc->bge_dyncoal_max_intr_freq;
% +		sc->bge_rx_coal_ticks = BGE_TICKS_PER_SEC /
% +		    sc->bge_dyncoal_max_intr_freq;
% +	} else
% +		sc->bge_rx_coal_ticks = 150;
%  	CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks);
%  	CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks);
% @@ -2226,4 +2237,53 @@
% 
%  static int
% +bge_sysctl_program_coal(SYSCTL_HANDLER_ARGS)
% +{
% +	struct bge_softc *sc;
% +	int error, i, val;
% +
% +	val = 0;
% +	error = sysctl_handle_int(oidp, &val, 0, req);
% +	if (error != 0 || req->newptr == NULL)
% +		return (error);
% +        sc = arg1;
% +	BGE_LOCK(sc);
% +
% +	/* XXX cut from bge_blockinit(): */
% +
% +	/* Disable host coalescing until we get it set up */
% +	CSR_WRITE_4(sc, BGE_HCC_MODE, 0x00000000);
% +
% +	/* Poll to make sure it's shut down. */
% +	for (i = 0; i < BGE_TIMEOUT; i++) {
% +		if (!(CSR_READ_4(sc, BGE_HCC_MODE) & BGE_HCCMODE_ENABLE))
% +			break;
% +		DELAY(10);
% +	}
% +
% +	if (i == BGE_TIMEOUT) {
% +		device_printf(sc->bge_dev,
% +		    "host coalescing engine failed to idle\n");
% +		CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE);
% +		BGE_UNLOCK(sc);
% +		return (ENXIO);
% +	}
% +
% +	/* Set up host coalescing defaults */
% +	if (sc->bge_dyncoal_max_intr_freq != 0)
% +		sc->bge_dyncoal_scale = ((uint64_t)1 << 24) /
% +		    sc->bge_dyncoal_max_intr_freq;
% +	CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks);
% +	CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks);
% +	CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, sc->bge_rx_max_coal_bds);
% +	CSR_WRITE_4(sc, BGE_HCC_TX_MAX_COAL_BDS, sc->bge_tx_max_coal_bds);
% +
% +	/* Turn on host coalescing state machine */
% +	CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE);
% +
% +	BGE_UNLOCK(sc);
% +	return (0);
% +}
% +
% +static int
%  bge_attach(device_t dev)
%  {
% @@ -2444,4 +2504,5 @@
%  	else
%  		sc->bge_return_ring_cnt = BGE_RETURN_RING_CNT;
% +	bge_return_ring_cnt = sc->bge_return_ring_cnt;	/* XXX */
% 
%  	if (bge_dma_alloc(dev)) {
% @@ -2454,8 +2515,8 @@
%  	/* Set default tuneable values. */
%  	sc->bge_stat_ticks = BGE_TICKS_PER_SEC;
% -	sc->bge_rx_coal_ticks = 150;
% -	sc->bge_tx_coal_ticks = 150;
% -	sc->bge_rx_max_coal_bds = 10;
% -	sc->bge_tx_max_coal_bds = 10;
% +	sc->bge_dyncoal_max_intr_freq = 10000;
% +	sc->bge_tx_coal_ticks = 1000000;
% +	sc->bge_rx_max_coal_bds = 128;
% +	sc->bge_tx_max_coal_bds = BGE_TX_RING_CNT * 3 / 4;
% 
%  	/* Set up ifnet structure */
% @@ -2473,5 +2534,9 @@
%  	ifp->if_init = bge_init;
%  	ifp->if_mtu = ETHERMTU;
% -	ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT - 1;
% +	if (bge_qlen & 1)
% +		ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT +
% +		    imax(2 * tick, 10000) / 4;
% +	else
% +		ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT - 1;
%  	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
%  	IFQ_SET_READY(&ifp->if_snd);
% @@ -2861,4 +2926,55 @@
%  }
% 
% +struct bgrstats {
% +	struct timeval enter;
% +	struct timeval exit;
% +	int	cnt0;
% +	int	cnt1;
% +};
% +
% +/* XXX globals without global locking, so don't enable for multiple bge's. */
% +
% +static struct bgrstats bgrs[1024];
% +
% +static int bgrse;
% +SYSCTL_INT(_debug, OID_AUTO, bgrse, CTLFLAG_RW,
% +    &bgrse, 0, "bge rx stats enable");
% +
% +static int bgrso;
% +SYSCTL_INT(_debug, OID_AUTO, bgrso, CTLFLAG_RW,
% +    &bgrso, 0, "bge rx stats offset");
% +
% +static int
% +sysctl_bgrs(SYSCTL_HANDLER_ARGS)
% +{
% +	size_t len;
% +	int error, i, max;
% +	char buf[256];
% +
% +	for (i = 1, max = sizeof(bgrs) / sizeof(bgrs[0]); i < max; i++) {
% +		len = sprintf(buf,
% +		    "%4ld %10ld.%06ld %3d %3ld %3d %10ld.%06ld %3d\n",
% +		    (bgrs[i].enter.tv_sec - bgrs[i - 1].exit.tv_sec) * 1000000 +
% +		    bgrs[i].enter.tv_usec - bgrs[i - 1].exit.tv_usec,
% +		    (long)bgrs[i].enter.tv_sec, bgrs[i].enter.tv_usec,
% +		    bgrs[i].cnt0,
% +		    (bgrs[i].exit.tv_sec - bgrs[i].enter.tv_sec) * 1000000 +
% +		    bgrs[i].exit.tv_usec - bgrs[i].enter.tv_usec,
% +		    (bgrs[i].cnt1 - bgrs[i].cnt0 + bge_return_ring_cnt) %
% +		    bge_return_ring_cnt,
% +		    (long)bgrs[i].exit.tv_sec, bgrs[i].exit.tv_usec,
% +		    bgrs[i].cnt1);
% +		if (i == max - 1)
% +			buf[len - 1] = '\0';
% +		error = SYSCTL_OUT(req, buf, len);
% +		if (error != 0)
% +			return (error);
% +	}
% +	return (0);
% +}
% +
% +SYSCTL_PROC(_debug, OID_AUTO, bgrs, CTLTYPE_STRING | CTLFLAG_RD,
% +    0, 0, sysctl_bgrs, "A", "bge rx stats");
% +
%  /*
%   * Frame reception handling. This is called if there's a frame
% @@ -2883,4 +2999,9 @@
%  		return;
% 
% +	if (bgrse) {
% +		microtime(&bgrs[bgrso].enter);
% +		bgrs[bgrso].cnt0 = sc->bge_rx_saved_considx;
% +	}
% +
%  	ifp = sc->bge_ifp;
% 
% @@ -2953,5 +3074,8 @@
%  			stdcnt++;
%  			if (cur_rx->bge_flags & BGE_RXBDFLAG_ERROR) {
% +				if (bge_errsrc & 1)
%  				ifp->if_ierrors++;
% +				if (bge_errsrc & 8)
% +				printf("errflag %#x\n", cur_rx->bge_error_flag);
%  				bge_newbuf_std(sc, sc->bge_std, m);
%  				continue;
% @@ -2959,4 +3083,5 @@
%  			if (bge_newbuf_std(sc, sc->bge_std,
%  			    NULL) == ENOBUFS) {
% +				if (bge_errsrc & 2)
%  				ifp->if_ierrors++;
%  				bge_newbuf_std(sc, sc->bge_std, m);
% @@ -3036,6 +3161,60 @@
%  		ifp->if_ierrors += CSR_READ_4(sc, BGE_RXLP_LOCSTAT_IFIN_DROPS);
%  #endif
% +
% +	if (bgrse) {
% +		bgrs[bgrso].cnt1 = sc->bge_rx_saved_considx;
% +		microtime(&bgrs[bgrso].exit);
% +		bgrso = (bgrso + 1) % (sizeof(bgrs) / sizeof(bgrs[0]));
% +	}
%  }
% 
% +struct bgtstats {
% +	struct timeval enter;
% +	struct timeval exit;
% +	int	cnt0;
% +	int	cnt1;
% +};
% +
% +static struct bgtstats bgts[1024];
% +
% +static int bgtse;
% +SYSCTL_INT(_debug, OID_AUTO, bgtse, CTLFLAG_RW,
% +    &bgtse, 0, "bge tx stats enable");
% +
% +static int bgtso;
% +SYSCTL_INT(_debug, OID_AUTO, bgtso, CTLFLAG_RW,
% +    &bgtso, 0, "bge tx stats offset");
% +
% +static int
% +sysctl_bgts(SYSCTL_HANDLER_ARGS)
% +{
% +	size_t len;
% +	int error, i, max;
% +	char buf[256];
% +
% +	for (i = 1, max = sizeof(bgts) / sizeof(bgts[0]); i < max; i++) {
% +		len = sprintf(buf,
% +		    "%4ld %10ld.%06ld %3d %3ld %3d %10ld.%06ld %3d\n",
% +		    (bgts[i].enter.tv_sec - bgts[i - 1].exit.tv_sec) * 1000000 +
% +		    bgts[i].enter.tv_usec - bgts[i - 1].exit.tv_usec,
% +		    (long)bgts[i].enter.tv_sec, bgts[i].enter.tv_usec,
% +		    bgts[i].cnt0,
% +		    (bgts[i].exit.tv_sec - bgts[i].enter.tv_sec) * 1000000 +
% +		    bgts[i].exit.tv_usec - bgts[i].enter.tv_usec,
% +		    bgts[i].cnt0 - bgts[i].cnt1,
% +		    (long)bgts[i].exit.tv_sec, bgts[i].exit.tv_usec,
% +		    bgts[i].cnt1);
% +		if (i == max - 1)
% +			buf[len - 1] = '\0';
% +		error = SYSCTL_OUT(req, buf, len);
% +		if (error != 0)
% +			return (error);
% +	}
% +	return (0);
% +}
% +
% +SYSCTL_PROC(_debug, OID_AUTO, bgts, CTLTYPE_STRING | CTLFLAG_RD,
% +    0, 0, sysctl_bgts, "A", "bge tx stats");
% +
%  static void
%  bge_txeof(struct bge_softc *sc)
% @@ -3051,4 +3230,9 @@
%  		return;
% 
% +	if (bgtse) {
% +		microtime(&bgts[bgtso].enter);
% +		bgts[bgtso].cnt0 = sc->bge_txcnt;
% +	}
% +
%  	ifp = sc->bge_ifp;
% 
% @@ -3085,4 +3269,10 @@
%  	if (sc->bge_txcnt == 0)
%  		sc->bge_timer = 0;
% +
% +	if (bgtse) {
% +		bgts[bgtso].cnt1 = sc->bge_txcnt;
% +		microtime(&bgts[bgtso].exit);
% +		bgtso = (bgtso + 1) % (sizeof(bgts) / sizeof(bgts[0]));
% +	}
%  }
% 
% @@ -3103,6 +3293,12 @@
%  	    sc->bge_cdata.bge_status_map, BUS_DMASYNC_POSTREAD);
% 
% +	/* XXX possible race on switching from interrupt mode. */
%  	statusword = atomic_readandclear_32(
%  	    &sc->bge_ldata.bge_status_block->bge_status);
% +	if (cmd != POLL_AND_CHECK_STATUS && bge_polling_trust_statusword &&
% +	    (statusword & BGE_STATFLAG_UPDATED) == 0) {
% +		BGE_UNLOCK(sc);
% +		return;
% +	}
% 
%  	bus_dmamap_sync(sc->bge_cdata.bge_status_tag,
% @@ -3134,8 +3330,24 @@
%  	struct bge_softc *sc;
%  	struct ifnet *ifp;
% -	uint32_t statusword;
% +	uint32_t macstatus, statusword;
% 
%  	sc = xsc;
% 
% +	/*
% +	 * Quick check without locking or syncing.  Since we don't ack the
% +	 * interrupt when we return early, the hardware will repeat the
% +	 * interrupt if we lose a race here.  Later we will clear the
% +	 * status, and that needs at least the lock.
% +	 *
% +	 * XXX sc->bge_link_evt and maybe the BCM5700 errata are not handled.
% +	 *
% +	 * XXX there is no good order for this check relative to the
% +	 * IFCAP_POLLING one.  Since I don't believe in polling, I optimized
% +	 * for !polling.
% +	 */
% +	statusword = sc->bge_ldata.bge_status_block->bge_status;
% +	if ((statusword & BGE_STATFLAG_UPDATED) == 0)
% +		return;
% +
%  	BGE_LOCK(sc);
% 
% @@ -3174,5 +3386,5 @@
%  	 * Do the mandatory PCI flush as well as get the link status.
%  	 */
% -	statusword = CSR_READ_4(sc, BGE_MAC_STS) & BGE_MACSTAT_LINK_CHANGED;
% +	macstatus = CSR_READ_4(sc, BGE_MAC_STS);
% 
%  	/* Make sure the descriptor ring indexes are coherent. */
% @@ -3184,13 +3396,56 @@
%  	if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 &&
%  	    sc->bge_chipid != BGE_CHIPID_BCM5700_B2) ||
% -	    statusword || sc->bge_link_evt)
% +	    (macstatus & BGE_MACSTAT_LINK_CHANGED) || sc->bge_link_evt)
%  		bge_link_upd(sc);
% 
%  	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
% -		/* Check RX return ring producer/consumer. */
%  		bge_rxeof(sc);
% -
% -		/* Check TX ring producer/consumer. */
%  		bge_txeof(sc);
% +		if (sc->bge_dyncoal_max_intr_freq != 0 &&
% +		    ++sc->bge_dyncoal_intrcnt == 16) {
% +			struct bintime bt;
% +			uint32_t dpi, pfrac, tfrac, xtime;
% +
% +			binuptime(&bt);
% +			xtime = (bt.sec << 24) | (bt.frac >> 40);
% +			pfrac = (ifp->if_ipackets - sc->bge_dyncoal_ipackets) *
% +			    sc->bge_dyncoal_scale;
% +			tfrac = xtime - sc->bge_dyncoal_xtime;
% +			sc->bge_dyncoal_rx_pps =
% +			    (ifp->if_ipackets - sc->bge_dyncoal_ipackets) *
% +			    ((uint64_t)1 << 24) / tfrac;
% +			dpi = pfrac / (tfrac | 2) + 1;
% +			if (dpi > sc->bge_rx_max_coal_bds)
% +				dpi = sc->bge_rx_max_coal_bds;
% +			if (dpi != sc->bge_dyncoal_rx_max_coal_bds) {
% +				if (bge_careful_coal) {
% +				CSR_WRITE_4(sc, BGE_HCC_MODE, 0);
% +				CSR_READ_4(sc, BGE_HCC_MODE);
% +				if ((CSR_READ_4(sc, BGE_HCC_MODE) &
% +				    BGE_HCCMODE_ENABLE) == 0) {
% +					CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS,
% +					    dpi);
% +					sc->bge_dyncoal_rx_max_coal_bds = dpi;
% +					bge_coal_writes++;
% +				} else
% +					bge_coal_write_fails++;
% +				CSR_WRITE_4(sc, BGE_HCC_MODE,
% +				    BGE_HCCMODE_ENABLE);
% +				} else {
% +				/*
% +				 * XXX not waiting for the engine is needed
% +				 * for efficiency since we reprogram it a
% +				 * lot so as to react fast, and this seems
% +				 * to work.  However, similar reprogramming
% +				 * of RX_COAL_TICKS doesn't work.
% +				 */
% +				CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, dpi);
% +				sc->bge_dyncoal_rx_max_coal_bds = dpi;
% +				}
% +			}
% +			sc->bge_dyncoal_xtime = xtime;
% +			sc->bge_dyncoal_intrcnt = 0;
% +			sc->bge_dyncoal_ipackets = ifp->if_ipackets;
% +		}
%  	}
% 
% @@ -3241,7 +3496,15 @@
%  	if ((sc->bge_flags & BGE_FLAG_TBI) == 0) {
%  		mii = device_get_softc(sc->bge_miibus);
% -		/* Don't mess with the PHY in IPMI/ASF mode */
% -		if (!((sc->bge_asf_mode & ASF_STACKUP) && (sc->bge_link)))
% +		/* Don't mess with the PHY unless link is down. */
% +		if (!sc->bge_link) {
% +			if (bge_errsrc & 0x20)
% +				microtime(&bgrs[bgrso].enter);
% +			if (bge_errsrc & 0x10)
%  			mii_tick(mii);
% +			if (bge_errsrc & 0x20) {
% +			microtime(&bgrs[bgrso].exit);
% +			bgrso = (bgrso + 1) % (sizeof(bgrs) / sizeof(bgrs[0]));
% +			}
% +		}
%  	} else {
%  		/*
% @@ -3276,4 +3539,5 @@
%  	    offsetof(struct bge_mac_stats_regs, etherStatsCollisions));
% 
% +	if (bge_errsrc & 4)
%  	ifp->if_ierrors += CSR_READ_4(sc, BGE_RXLP_LOCSTAT_IFIN_DROPS);
%  }
% @@ -3298,4 +3562,5 @@
% 
%  	cnt = READ_STAT(sc, stats, ifInDiscards.bge_addr_lo);
% +	if (bge_errsrc & 4)
%  	ifp->if_ierrors += (uint32_t)(cnt - sc->bge_rx_discards);
%  	sc->bge_rx_discards = cnt;
% @@ -4266,5 +4531,6 @@
%  }
% 
% -#define BGE_SYSCTL_STAT(sc, ctx, desc, parent, node, oid) \
% +/* XXX move this down and fix style bugs in it. */
% +#define BGE_SYSCTL_STAT_GEN(sc, ctx, desc, parent, node, oid) \
%  	SYSCTL_ADD_PROC(ctx, parent, OID_AUTO, oid, CTLTYPE_UINT|CTLFLAG_RD, \
%  	    sc, offsetof(struct bge_stats, node), bge_sysctl_stats, "IU", \
% @@ -4281,4 +4547,27 @@
%  	children = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->bge_dev));
% 
% +	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "program_coal",
% +	    CTLTYPE_INT | CTLFLAG_RW,
% +	    sc, 0, bge_sysctl_program_coal, "I",
% +	    "program bge coalescence values");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_coal_ticks", CTLFLAG_RW,
% +	    &sc->bge_rx_coal_ticks, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_coal_ticks", CTLFLAG_RW,
% +	    &sc->bge_tx_coal_ticks, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_max_coal_bds", CTLFLAG_RW,
% +	    &sc->bge_rx_max_coal_bds, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_max_coal_bds", CTLFLAG_RW,
% +	    &sc->bge_tx_max_coal_bds, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_max_intr_freq",
% +	    CTLFLAG_RW,
% +	    &sc->bge_dyncoal_max_intr_freq, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_max_coal_bds",
% +	    CTLFLAG_RD,
% +	    &sc->bge_dyncoal_rx_max_coal_bds, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_pps", CTLFLAG_RD,
% +	    &sc->bge_dyncoal_rx_pps, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_scale", CTLFLAG_RD,
% +	    &sc->bge_dyncoal_scale, 0, "");
% +
%  #ifdef BGE_REGISTER_DEBUG
%  	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "debug_info",
% @@ -4299,4 +4588,7 @@
%  	    NULL, "BGE Statistics");
%  	schildren = children = SYSCTL_CHILDREN(tree);
% +	/* Most of these seem to be unavailable on 5705+. */
% +if (!BGE_IS_5705_PLUS(sc)) {
% +#define BGE_SYSCTL_STAT		BGE_SYSCTL_STAT_GEN
%  	BGE_SYSCTL_STAT(sc, ctx, "Frames Dropped Due To Filters",
%  	    children, COSFramesDroppedDueToFilters,
% @@ -4308,4 +4600,8 @@
%  	BGE_SYSCTL_STAT(sc, ctx, "NIC No More RX Buffer Descriptors",
%  	    children, nicNoMoreRxBDs, "NoMoreRxBDs");
% +	/*
% +	 * The next one seems to be in BGE_RXLP_LOCSTAT_IFIN_DROPS for
% +	 * the 5705+ case -- bge_stats_update_regs() uses this.
% +	 */
%  	BGE_SYSCTL_STAT(sc, ctx, "Discarded Input Frames",
%  	    children, ifInDiscards, "InputDiscards");
% @@ -4330,86 +4626,126 @@
%  	BGE_SYSCTL_STAT(sc, ctx, "NIC Send Threshold Hit",
%  	    children, nicSendThresholdHit, "SendThresholdHit");
% +}
% 
%  	tree = SYSCTL_ADD_NODE(ctx, schildren, OID_AUTO, "rx", CTLFLAG_RD,
%  	    NULL, "BGE RX Statistics");
%  	children = SYSCTL_CHILDREN(tree);
% +	__asm("# label for testing ifHCInOctets");
% +	/*
% +	 * Most rx stats are available for the 5705+case, but in a
% +	 * different layout and with different semantics (32 bit registers
% +	 * holding 12 (?) bit values which are reset on write instead of
% +	 * 64-bit registers).  We only handle the layout differences, and
% +	 * do that using extremely ugly macros.  Resetting of the registers
% +	 * currently makes this sysctl almost useless for the 5705+ ase.
% +	 *
% +	 * The mapping of registers into structs mostly just gets in the
% +	 * way here.
% +	 */
% +#define BGE_SYSCTL_STAT_RX(sc, ctx, desc, parent, node, oid)		\
% +	SYSCTL_ADD_PROC(ctx, parent, OID_AUTO, oid,			\
% +	    CTLTYPE_UINT | CTLFLAG_RD, sc,				\
% +	    BGE_IS_5705_PLUS(sc) ?					\
% +	    offsetof(struct bge_mac_stats_regs, node) :			\
% +	    offsetof(struct bge_stats, rxstats.node),			\
% +	    bge_sysctl_stats, "IU", desc)
% +#undef BGE_SYSCTL_STAT
% +#define	BGE_SYSCTL_STAT		BGE_SYSCTL_STAT_RX
% +
%  	BGE_SYSCTL_STAT(sc, ctx, "Inbound Octets",
% -	    children, rxstats.ifHCInOctets, "Octets");
% +	    children, ifHCInOctets, "Octets");
%  	BGE_SYSCTL_STAT(sc, ctx, "Fragments",
% -	    children, rxstats.etherStatsFragments, "Fragments");
% +	    children, etherStatsFragments, "Fragments");
%  	BGE_SYSCTL_STAT(sc, ctx, "Inbound Unicast Packets",
% -	    children, rxstats.ifHCInUcastPkts, "UcastPkts");
% +	    children, ifHCInUcastPkts, "UcastPkts");
%  	BGE_SYSCTL_STAT(sc, ctx, "Inbound Multicast Packets",
% -	    children, rxstats.ifHCInMulticastPkts, "MulticastPkts");
% +	    children, ifHCInMulticastPkts, "MulticastPkts");
%  	BGE_SYSCTL_STAT(sc, ctx, "FCS Errors",
% -	    children, rxstats.dot3StatsFCSErrors, "FCSErrors");
% +	    children, dot3StatsFCSErrors, "FCSErrors");
%  	BGE_SYSCTL_STAT(sc, ctx, "Alignment Errors",
% -	    children, rxstats.dot3StatsAlignmentErrors, "AlignmentErrors");
% +	    children, dot3StatsAlignmentErrors, "AlignmentErrors");
%  	BGE_SYSCTL_STAT(sc, ctx, "XON Pause Frames Received",
% -	    children, rxstats.xonPauseFramesReceived, "xonPauseFramesReceived");
% +	    children, xonPauseFramesReceived, "xonPauseFramesReceived");
%  	BGE_SYSCTL_STAT(sc, ctx, "XOFF Pause Frames Received",
% -	    children, rxstats.xoffPauseFramesReceived,
% -	    "xoffPauseFramesReceived");
% +	    children, xoffPauseFramesReceived, "xoffPauseFramesReceived");
%  	BGE_SYSCTL_STAT(sc, ctx, "MAC Control Frames Received",
% -	    children, rxstats.macControlFramesReceived,
% -	    "ControlFramesReceived");
% +	    children, macControlFramesReceived, "ControlFramesReceived");
%  	BGE_SYSCTL_STAT(sc, ctx, "XOFF State Entered",
% -	    children, rxstats.xoffStateEntered, "xoffStateEntered");
% +	    children, xoffStateEntered, "xoffStateEntered");
%  	BGE_SYSCTL_STAT(sc, ctx, "Frames Too Long",
% -	    children, rxstats.dot3StatsFramesTooLong, "FramesTooLong");
% +	    children, dot3StatsFramesTooLong, "FramesTooLong");
%  	BGE_SYSCTL_STAT(sc, ctx, "Jabbers",
% -	    children, rxstats.etherStatsJabbers, "Jabbers");
% +	    children, etherStatsJabbers, "Jabbers");
%  	BGE_SYSCTL_STAT(sc, ctx, "Undersized Packets",
% -	    children, rxstats.etherStatsUndersizePkts, "UndersizePkts");
% -	BGE_SYSCTL_STAT(sc, ctx, "Inbound Range Length Errors",
% +	    children, etherStatsUndersizePkts, "UndersizePkts");
% +	/* The next 2 seem to be unavailable for the 5705 case. */
% +if (!BGE_IS_5705_PLUS(sc)) {
% +	BGE_SYSCTL_STAT_GEN(sc, ctx, "Inbound Range Length Errors",
%  	    children, rxstats.inRangeLengthError, "inRangeLengthError");
% -	BGE_SYSCTL_STAT(sc, ctx, "Outbound Range Length Errors",
% +	BGE_SYSCTL_STAT_GEN(sc, ctx, "Outbound Range Length Errors",
%  	    children, rxstats.outRangeLengthError, "outRangeLengthError");
% +}
% 
%  	tree = SYSCTL_ADD_NODE(ctx, schildren, OID_AUTO, "tx", CTLFLAG_RD,
%  	    NULL, "BGE TX Statistics");
%  	children = SYSCTL_CHILDREN(tree);
% +	__asm("# label for testing ifHCOutOctets");
% +	/*
% +	 * tx is like rx except the macro needs "txstats." instead of
% +	 * ".rxstats" for the non-5705+ variant.  Redefine it again
% +	 * to get this.
% +	 */
% +#define BGE_SYSCTL_STAT_TX(sc, ctx, desc, parent, node, oid)		\
% +	SYSCTL_ADD_PROC(ctx, parent, OID_AUTO, oid,			\
% +	    CTLTYPE_UINT | CTLFLAG_RD, sc,				\
% +	    BGE_IS_5705_PLUS(sc) ?					\
% +	    offsetof(struct bge_mac_stats_regs, node) :			\
% +	    offsetof(struct bge_stats, txstats.node),			\
% +	    bge_sysctl_stats, "IU", desc)
% +#undef BGE_SYSCTL_STAT
% +#define	BGE_SYSCTL_STAT		BGE_SYSCTL_STAT_TX
% +
%  	BGE_SYSCTL_STAT(sc, ctx, "Outbound Octets",
% -	    children, txstats.ifHCOutOctets, "Octets");
% +	    children, ifHCOutOctets, "Octets");
%  	BGE_SYSCTL_STAT(sc, ctx, "TX Collisions",
% -	    children, txstats.etherStatsCollisions, "Collisions");
% +	    children, etherStatsCollisions, "Collisions");
%  	BGE_SYSCTL_STAT(sc, ctx, "XON Sent",
% -	    children, txstats.outXonSent, "XonSent");
% +	    children, outXonSent, "XonSent");
%  	BGE_SYSCTL_STAT(sc, ctx, "XOFF Sent",
% -	    children, txstats.outXoffSent, "XoffSent");
% -	BGE_SYSCTL_STAT(sc, ctx, "Flow Control Done",
% +	    children, outXoffSent, "XoffSent");
% +if (!BGE_IS_5705_PLUS(sc)) {
% +	BGE_SYSCTL_STAT_GEN(sc, ctx, "Flow Control Done",
%  	    children, txstats.flowControlDone, "flowControlDone");
% +}
%  	BGE_SYSCTL_STAT(sc, ctx, "Internal MAC TX errors",
% -	    children, txstats.dot3StatsInternalMacTransmitErrors,
% +	    children, dot3StatsInternalMacTransmitErrors,
%  	    "InternalMacTransmitErrors");
%  	BGE_SYSCTL_STAT(sc, ctx, "Single Collision Frames",
% -	    children, txstats.dot3StatsSingleCollisionFrames,
% -	    "SingleCollisionFrames");
% +	    children, dot3StatsSingleCollisionFrames, "SingleCollisionFrames");
%  	BGE_SYSCTL_STAT(sc, ctx, "Multiple Collision Frames",
% -	    children, txstats.dot3StatsMultipleCollisionFrames,
% +	    children, dot3StatsMultipleCollisionFrames,
%  	    "MultipleCollisionFrames");
%  	BGE_SYSCTL_STAT(sc, ctx, "Deferred Transmissions", 
% -	    children, txstats.dot3StatsDeferredTransmissions,
% -	    "DeferredTransmissions");
% +	    children, dot3StatsDeferredTransmissions, "DeferredTransmissions");
%  	BGE_SYSCTL_STAT(sc, ctx, "Excessive Collisions",
% -	    children, txstats.dot3StatsExcessiveCollisions,
% -	    "ExcessiveCollisions");
% +	    children, dot3StatsExcessiveCollisions, "ExcessiveCollisions");
%  	BGE_SYSCTL_STAT(sc, ctx, "Late Collisions",
% -	    children, txstats.dot3StatsLateCollisions,
% -	    "LateCollisions");
% +	    children, dot3StatsLateCollisions, "LateCollisions");
%  	BGE_SYSCTL_STAT(sc, ctx, "Outbound Unicast Packets", 
% -	    children, txstats.ifHCOutUcastPkts, "UcastPkts");
% +	    children, ifHCOutUcastPkts, "UcastPkts");
%  	BGE_SYSCTL_STAT(sc, ctx, "Outbound Multicast Packets",
% -	    children, txstats.ifHCOutMulticastPkts, "MulticastPkts");
% +	    children, ifHCOutMulticastPkts, "MulticastPkts");
%  	BGE_SYSCTL_STAT(sc, ctx, "Outbound Broadcast Packets",
% -	    children, txstats.ifHCOutBroadcastPkts, "BroadcastPkts");
% -	BGE_SYSCTL_STAT(sc, ctx, "Carrier Sense Errors",
% +	    children, ifHCOutBroadcastPkts, "BroadcastPkts");
% +if (!BGE_IS_5705_PLUS(sc)) {
% +	BGE_SYSCTL_STAT_GEN(sc, ctx, "Carrier Sense Errors",
%  	    children, txstats.dot3StatsCarrierSenseErrors,
%  	    "CarrierSenseErrors");
% -	BGE_SYSCTL_STAT(sc, ctx, "Outbound Discards",
% +	BGE_SYSCTL_STAT_GEN(sc, ctx, "Outbound Discards",
%  	    children, txstats.ifOutDiscards, "Discards");
% -	BGE_SYSCTL_STAT(sc, ctx, "Outbound Errors",
% +	BGE_SYSCTL_STAT_GEN(sc, ctx, "Outbound Errors",
%  	    children, txstats.ifOutErrors, "Errors");
%  }
% +}
% 
%  static int
% @@ -4422,10 +4758,13 @@
%  	sc = (struct bge_softc *)arg1;
%  	offset = arg2;
% -	if (BGE_IS_5705_PLUS(sc))
% +	if (BGE_IS_5705_PLUS(sc)) {
%  		base = BGE_MAC_STATS;
% -	else
% +		result = CSR_READ_4(sc, base + offset);
% +	}
% +	else {
%  		base = BGE_MEMWIN_START + BGE_STATS_BLOCK;
% -	result = CSR_READ_4(sc, base + offset + offsetof(bge_hostaddr,
% -	    bge_addr_lo));
% +		result = CSR_READ_4(sc, base + offset + offsetof(bge_hostaddr,
% +		    bge_addr_lo));
% +	}
%  	return (sysctl_handle_int(oidp, &result, 0, req));
%  }
% Index: if_bgereg.h
% ===================================================================
% RCS file: /home/ncvs/src/sys/dev/bge/if_bgereg.h,v
% retrieving revision 1.73
% diff -u -2 -r1.73 if_bgereg.h
% --- if_bgereg.h	22 May 2007 19:22:58 -0000	1.73
% +++ if_bgereg.h	23 May 2007 09:12:50 -0000
% @@ -2338,13 +2338,7 @@
% 
%  /*
% - * Memory management stuff. Note: the SSLOTS, MSLOTS and JSLOTS
% - * values are tuneable. They control the actual amount of buffers
% - * allocated for the standard, mini and jumbo receive rings.
% + * Memory management stuff.
%   */
% 
% -#define	BGE_SSLOTS	256
% -#define	BGE_MSLOTS	256
% -#define	BGE_JSLOTS	384
% -
%  #define	BGE_JRAWLEN (BGE_JUMBO_FRAMELEN + ETHER_ALIGN)
%  #define	BGE_JLEN (BGE_JRAWLEN + (sizeof(uint64_t) - \
% @@ -2504,4 +2498,11 @@
%  	uint32_t		bge_tx_discards;
%  	uint32_t		bge_tx_collisions;
% +	int			bge_dyncoal_intrcnt;
% +	u_long			bge_dyncoal_ipackets;
% +	uint32_t		bge_dyncoal_max_intr_freq;
% +	uint32_t		bge_dyncoal_rx_max_coal_bds;
% +	uint32_t		bge_dyncoal_rx_pps;
% +	uint32_t		bge_dyncoal_scale;
% +	uint32_t		bge_dyncoal_xtime;
%  #ifdef DEVICE_POLLING
%  	int			rxcycles;
---

Edited version (may have deleted too much or too little):
---
% Index: if_bge.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v
% retrieving revision 1.198
% diff -u -2 -r1.198 if_bge.c
% --- if_bge.c	30 Sep 2007 11:05:14 -0000	1.198
% +++ if_bge.c	8 Nov 2007 16:01:49 -0000
% @@ -1,2 +1,10 @@
% +int bge_careful_coal = 1;
% +int bge_qlen = 1;
% +int bge_errsrc = 0x17;
% +int bge_rx_repl = 64;
% +int bge_coal_writes;
% +int bge_coal_write_fails;
% +int bge_polling_trust_statusword = 0;
% +
%  /*-
%   * Copyright (c) 2001 Wind River Systems
% @@ -867,10 +877,4 @@
%  }
% 
% -/*
% - * The standard receive ring has 512 entries in it. At 2K per mbuf cluster,
% - * that's 1MB or memory, which is a lot. For now, we fill only the first
% - * 256 ring entries and hope that our CPU is fast enough to keep up with
% - * the NIC.
% - */
%  static int
%  bge_init_rx_ring_std(struct bge_softc *sc)
% @@ -878,8 +882,8 @@
%  	int i;
% 
% -	for (i = 0; i < BGE_SSLOTS; i++) {
% +	for (i = 0; i < BGE_STD_RX_RING_CNT; i++) {
%  		if (bge_newbuf_std(sc, i, NULL) == ENOBUFS)
%  			return (ENOBUFS);
% -	};
% +	}
% 
%  	bus_dmamap_sync(sc->bge_cdata.bge_rx_std_ring_tag,
% @@ -1530,4 +1534,11 @@
% 
%  	/* Set up host coalescing defaults */
% +	if (sc->bge_dyncoal_max_intr_freq != 0) {
% +		sc->bge_dyncoal_scale = ((uint64_t)1 << 24) /
% +		    sc->bge_dyncoal_max_intr_freq;
% +		sc->bge_rx_coal_ticks = BGE_TICKS_PER_SEC /
% +		    sc->bge_dyncoal_max_intr_freq;
% +	} else
% +		sc->bge_rx_coal_ticks = 150;
%  	CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks);
%  	CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks);
% @@ -2226,4 +2237,53 @@
% 
%  static int
% +bge_sysctl_program_coal(SYSCTL_HANDLER_ARGS)
% +{
% +	struct bge_softc *sc;
% +	int error, i, val;
% +
% +	val = 0;
% +	error = sysctl_handle_int(oidp, &val, 0, req);
% +	if (error != 0 || req->newptr == NULL)
% +		return (error);
% +        sc = arg1;
% +	BGE_LOCK(sc);
% +
% +	/* XXX cut from bge_blockinit(): */
% +
% +	/* Disable host coalescing until we get it set up */
% +	CSR_WRITE_4(sc, BGE_HCC_MODE, 0x00000000);
% +
% +	/* Poll to make sure it's shut down. */
% +	for (i = 0; i < BGE_TIMEOUT; i++) {
% +		if (!(CSR_READ_4(sc, BGE_HCC_MODE) & BGE_HCCMODE_ENABLE))
% +			break;
% +		DELAY(10);
% +	}
% +
% +	if (i == BGE_TIMEOUT) {
% +		device_printf(sc->bge_dev,
% +		    "host coalescing engine failed to idle\n");
% +		CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE);
% +		BGE_UNLOCK(sc);
% +		return (ENXIO);
% +	}
% +
% +	/* Set up host coalescing defaults */
% +	if (sc->bge_dyncoal_max_intr_freq != 0)
% +		sc->bge_dyncoal_scale = ((uint64_t)1 << 24) /
% +		    sc->bge_dyncoal_max_intr_freq;
% +	CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks);
% +	CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks);
% +	CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, sc->bge_rx_max_coal_bds);
% +	CSR_WRITE_4(sc, BGE_HCC_TX_MAX_COAL_BDS, sc->bge_tx_max_coal_bds);
% +
% +	/* Turn on host coalescing state machine */
% +	CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE);
% +
% +	BGE_UNLOCK(sc);
% +	return (0);
% +}
% +
% +static int
%  bge_attach(device_t dev)
%  {
% @@ -2454,8 +2515,8 @@
%  	/* Set default tuneable values. */
%  	sc->bge_stat_ticks = BGE_TICKS_PER_SEC;
% -	sc->bge_rx_coal_ticks = 150;
% -	sc->bge_tx_coal_ticks = 150;
% -	sc->bge_rx_max_coal_bds = 10;
% -	sc->bge_tx_max_coal_bds = 10;
% +	sc->bge_dyncoal_max_intr_freq = 10000;
% +	sc->bge_tx_coal_ticks = 1000000;
% +	sc->bge_rx_max_coal_bds = 128;
% +	sc->bge_tx_max_coal_bds = BGE_TX_RING_CNT * 3 / 4;
% 
%  	/* Set up ifnet structure */
% @@ -3184,13 +3396,56 @@
%  	if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 &&
%  	    sc->bge_chipid != BGE_CHIPID_BCM5700_B2) ||
% -	    statusword || sc->bge_link_evt)
% +	    statusword || sc->bge_link_evt)
%  		bge_link_upd(sc);
% 
%  	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
% -		/* Check RX return ring producer/consumer. */
%  		bge_rxeof(sc);
% -
% -		/* Check TX ring producer/consumer. */
%  		bge_txeof(sc);
% +		if (sc->bge_dyncoal_max_intr_freq != 0 &&
% +		    ++sc->bge_dyncoal_intrcnt == 16) {
% +			struct bintime bt;
% +			uint32_t dpi, pfrac, tfrac, xtime;
% +
% +			binuptime(&bt);
% +			xtime = (bt.sec << 24) | (bt.frac >> 40);
% +			pfrac = (ifp->if_ipackets - sc->bge_dyncoal_ipackets) *
% +			    sc->bge_dyncoal_scale;
% +			tfrac = xtime - sc->bge_dyncoal_xtime;
% +			sc->bge_dyncoal_rx_pps =
% +			    (ifp->if_ipackets - sc->bge_dyncoal_ipackets) *
% +			    ((uint64_t)1 << 24) / tfrac;
% +			dpi = pfrac / (tfrac | 2) + 1;
% +			if (dpi > sc->bge_rx_max_coal_bds)
% +				dpi = sc->bge_rx_max_coal_bds;
% +			if (dpi != sc->bge_dyncoal_rx_max_coal_bds) {
% +				if (bge_careful_coal) {
% +				CSR_WRITE_4(sc, BGE_HCC_MODE, 0);
% +				CSR_READ_4(sc, BGE_HCC_MODE);
% +				if ((CSR_READ_4(sc, BGE_HCC_MODE) &
% +				    BGE_HCCMODE_ENABLE) == 0) {
% +					CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS,
% +					    dpi);
% +					sc->bge_dyncoal_rx_max_coal_bds = dpi;
% +					bge_coal_writes++;
% +				} else
% +					bge_coal_write_fails++;
% +				CSR_WRITE_4(sc, BGE_HCC_MODE,
% +				    BGE_HCCMODE_ENABLE);
% +				} else {
% +				/*
% +				 * XXX not waiting for the engine is needed
% +				 * for efficiency since we reprogram it a
% +				 * lot so as to react fast, and this seems
% +				 * to work.  However, similar reprogramming
% +				 * of RX_COAL_TICKS doesn't work.
% +				 */
% +				CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, dpi);
% +				sc->bge_dyncoal_rx_max_coal_bds = dpi;
% +				}
% +			}
% +			sc->bge_dyncoal_xtime = xtime;
% +			sc->bge_dyncoal_intrcnt = 0;
% +			sc->bge_dyncoal_ipackets = ifp->if_ipackets;
% +		}
%  	}
% 
% @@ -4281,4 +4547,27 @@
%  	children = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->bge_dev));
% 
% +	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "program_coal",
% +	    CTLTYPE_INT | CTLFLAG_RW,
% +	    sc, 0, bge_sysctl_program_coal, "I",
% +	    "program bge coalescence values");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_coal_ticks", CTLFLAG_RW,
% +	    &sc->bge_rx_coal_ticks, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_coal_ticks", CTLFLAG_RW,
% +	    &sc->bge_tx_coal_ticks, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_max_coal_bds", CTLFLAG_RW,
% +	    &sc->bge_rx_max_coal_bds, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_max_coal_bds", CTLFLAG_RW,
% +	    &sc->bge_tx_max_coal_bds, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_max_intr_freq",
% +	    CTLFLAG_RW,
% +	    &sc->bge_dyncoal_max_intr_freq, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_max_coal_bds",
% +	    CTLFLAG_RD,
% +	    &sc->bge_dyncoal_rx_max_coal_bds, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_pps", CTLFLAG_RD,
% +	    &sc->bge_dyncoal_rx_pps, 0, "");
% +	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_scale", CTLFLAG_RD,
% +	    &sc->bge_dyncoal_scale, 0, "");
% +
%  #ifdef BGE_REGISTER_DEBUG
%  	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "debug_info",
% Index: if_bgereg.h
% ===================================================================
% RCS file: /home/ncvs/src/sys/dev/bge/if_bgereg.h,v
% retrieving revision 1.73
% diff -u -2 -r1.73 if_bgereg.h
% --- if_bgereg.h	22 May 2007 19:22:58 -0000	1.73
% +++ if_bgereg.h	23 May 2007 09:12:50 -0000
% @@ -2338,13 +2338,7 @@
% 
%  /*
% - * Memory management stuff. Note: the SSLOTS, MSLOTS and JSLOTS
% - * values are tuneable. They control the actual amount of buffers
% - * allocated for the standard, mini and jumbo receive rings.
% + * Memory management stuff.
%   */
% 
% -#define	BGE_SSLOTS	256
% -#define	BGE_MSLOTS	256
% -#define	BGE_JSLOTS	384
% -
%  #define	BGE_JRAWLEN (BGE_JUMBO_FRAMELEN + ETHER_ALIGN)
%  #define	BGE_JLEN (BGE_JRAWLEN + (sizeof(uint64_t) - \
% @@ -2504,4 +2498,11 @@
%  	uint32_t		bge_tx_discards;
%  	uint32_t		bge_tx_collisions;
% +	int			bge_dyncoal_intrcnt;
% +	u_long			bge_dyncoal_ipackets;
% +	uint32_t		bge_dyncoal_max_intr_freq;
% +	uint32_t		bge_dyncoal_rx_max_coal_bds;
% +	uint32_t		bge_dyncoal_rx_pps;
% +	uint32_t		bge_dyncoal_scale;
% +	uint32_t		bge_dyncoal_xtime;
%  #ifdef DEVICE_POLLING
%  	int			rxcycles;
---

Simple shell program for micro-adjusting parameters interactively (would
be easier using a mouse, but I don't like GUI programming, and the
parameter space is really too large to investigate manually):
---
#!/bin/sh

netstat=netstat
rx_coal_ticks=$(sysctl -n dev.bge.0.rx_coal_ticks)
rx_max_coal_bds=$(sysctl -n dev.bge.0.rx_max_coal_bds)
tx_coal_ticks=$(sysctl -n dev.bge.0.tx_coal_ticks)
tx_max_coal_bds=$(sysctl -n dev.bge.0.tx_max_coal_bds)
max_intr_freq=$(sysctl -n dev.bge.0.dyncoal_max_intr_freq)
drxbds=0
drxticks=0
dtxbds=0
dtxticks=0
while :
do
 	printf \
"rx ticks %d, rx bds %d, tx ticks %d, tx bds %d, freq %d, dyn bds %d\n"  \
 	    $rx_coal_ticks $rx_max_coal_bds $tx_coal_ticks $tx_max_coal_bds \
 	    $max_intr_freq $(sysctl -n dev.bge.0.dyncoal_rx_max_coal_bds)
 	# ($netstat -I bge0 1 | head -3 | tail -1) 2>/dev/null
 	sysctl dev.bge.0.rx_coal_ticks=$rx_coal_ticks >/dev/null
 	sysctl dev.bge.0.tx_coal_ticks=$tx_coal_ticks >/dev/null
 	sysctl dev.bge.0.rx_max_coal_bds=$rx_max_coal_bds >/dev/null
 	sysctl dev.bge.0.tx_max_coal_bds=$tx_max_coal_bds >/dev/null
 	sysctl dev.bge.0.program_coal=0 >/dev/null
 	read x
 	case "$x" in
 	0) drxticks=0; drxbds=0; dtxticks=0; dtxbds=0 ;;
 	H) drxticks=$(($drxticks - 1)) ;;
 	J) drxbds=$(($drxbds - 1)) ;;
 	K) drxbds=$(($drxbds + 1)) ;;
 	L) drxticks=$(($drxticks + 1)) ;;
 	h) dtxticks=$(($dtxticks - 1)) ;;
 	j) dtxbds=$(($dtxbds - 1)) ;;
 	k) dtxbds=$(($dtxbds + 1)) ;;
 	l) dtxticks=$(($dtxticks + 1)) ;;
 	n) ($netstat -I bge0 1 | head -3 | tail -1) 2>/dev/null
 	esac
 	rx_coal_ticks=$(($rx_coal_ticks + $drxticks))
 	rx_max_coal_bds=$(($rx_max_coal_bds + $drxbds))
 	tx_coal_ticks=$(($tx_coal_ticks + $dtxticks))
 	tx_max_coal_bds=$(($tx_max_coal_bds + $dtxbds))
done
---

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071117194615.L67319>