From owner-freebsd-net@FreeBSD.ORG Fri Apr 19 19:03:32 2013 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 95E18ABD; Fri, 19 Apr 2013 19:03:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id 02FFB173E; Fri, 19 Apr 2013 19:03:31 +0000 (UTC) Received: from mail17.syd.optusnet.com.au (mail17.syd.optusnet.com.au [211.29.132.198]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r3JBWnBd030164; Fri, 19 Apr 2013 21:32:49 +1000 Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail17.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r3JBWcE4012696 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 19 Apr 2013 21:32:40 +1000 Date: Fri, 19 Apr 2013 21:32:38 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: sbruno@FreeBSD.org Subject: Re: bge(4) sysctl tuneables -- a blast from the past. more knobs! MORE! In-Reply-To: <1366348388.1389.11.camel@localhost> Message-ID: <20130419203629.N1028@besplex.bde.org> References: <1365781568.1418.1.camel@localhost> <20130413200512.G1165@besplex.bde.org> <1366065356.1350.7.camel@localhost> <20130416152121.G904@besplex.bde.org> <1366348388.1389.11.camel@localhost> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Ov0XUFDt c=1 sm=1 a=mFpRME-EEqMA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=gmUmqqDpSwAA:10 a=6I5d2MoRAAAA:8 a=xB1JHitMHlILzK4byPMA:9 a=CjuIK1q_8ugA:10 a=02v0E20eI2-Xn35J:21 a=zPCK-PPU6FJlAUc6:21 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: "pyunyh@gmail.com" , "freebsd-net@freebsd.org" , bde , Bruce Evans X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Apr 2013 19:03:32 -0000 On Thu, 18 Apr 2013, Sean Bruno wrote: > Version 0.2 of patches to bge(4). I'm not totally happy with it, but > comments welcome. I need better explanations of usage for the man page. > > I've dropped bge_rxd completely here as it was suggested to not even > bother adjusting it. > > http://people.freebsd.org/~sbruno/bge_config_update_1.txt I dislike this. It is very bloated to have a SYSCTL_PROC() for every setting. Yet the SYSCTL_PROC()'s aren't even correct. They write directly to the device registers with no locking. Further bloat is given by a tunable backing every sysctl. Style bugs include verbose descriptions and obfuscation the strings for these. I prefer my version of course. It allows setting all the values using SYSCTL_INT()s. It is a feature that these settings are to memory variables only. This allows building up a complicated set of changes without having to ensure that all intermediate steps give a valid and/ or usable hardware setting at every stage. Then there is a SYSCTL_PROC() for the operation of transferring the values to the hardware. This sysctl isn't missing locking, and doesn't just write to the device registers, but reprograms the coal hardware like a device reset would. Direct writing to the registers would probably work except possibly for transient problems, and but the hardware doesn't seem to guarantee this and there are no benefits from current corners here. The little SYSCTL_PROC()s could be used to do range checks, but range checks would not be very useful and none are done now. Range checks of individual values as they are set would defeat building up complicated sets of changes in another way. Many combinations of values are invalid, but it is hard to say which, and it would be wrong to disallow intermediate invalid combinations that won't ever be used. All that can be done easily is to range check each parameter from its min to its max. This is not very useful, since combinations with everything min or everything max are probably just as dysfunctional as some slightly out of bounds values. >> There are only 2 bge tunables now, and they only have half of these bugs: >> - hw.bge.allow_asf is in the wrong namespace >> - hw.bge.allow_asf is too global > Maintainer disagrees with this. I think he agreed that this were wrong too. It is at least missing the verboseness (5 lines for this; 26 lines for each of 4 new sysctls). Anyway, the old bugs should be fixed separately. Here are my old coal patches for bge. They have been edited from a larger patch and may have editing errors. @ diff -c2 ./dev/bge/if_bge.c~ ./dev/bge/if_bge.c @ *** ./dev/bge/if_bge.c~ Fri May 16 16:39:01 2008 @ --- ./dev/bge/if_bge.c Thu Sep 30 19:51:38 2010 @ *************** @ *** 1,2 **** @ --- 1,10 ---- @ + int bge_careful_coal = 1; @ + int bge_qlen = 1; @ + int bge_errsrc = 0x17; @ + int bge_rx_repl = 64; @ + int bge_coal_writes; @ + int bge_coal_write_fails; @ + int bge_polling_trust_statusword = 0; @ + @ /*- @ * Copyright (c) 2001 Wind River Systems Debugging variables. bge_careful_coal selects a way of directly writing rx_max_coal_bds that is not completely sloppy though it doesn't busy-wait to synchronize. @ *************** @ *** 1661,1664 **** @ --- 1665,1675 ---- @ @ /* Set up host coalescing defaults */ @ + if (sc->bge_dyncoal_max_intr_freq != 0) { @ + sc->bge_dyncoal_scale = ((uint64_t)1 << 24) / @ + sc->bge_dyncoal_max_intr_freq; @ + sc->bge_rx_coal_ticks = BGE_TICKS_PER_SEC / @ + sc->bge_dyncoal_max_intr_freq; @ + } else @ + sc->bge_rx_coal_ticks = 150; @ CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks); @ CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks); @ *************** @ *** 2351,2354 **** @ --- 2362,2414 ---- @ @ static int @ + bge_sysctl_program_coal(SYSCTL_HANDLER_ARGS) @ + { @ + struct bge_softc *sc; @ + int error, i, val; @ + @ + val = 0; @ + error = sysctl_handle_int(oidp, &val, 0, req); @ + if (error != 0 || req->newptr == NULL) @ + return (error); @ + sc = arg1; @ + BGE_LOCK(sc); @ + @ + /* XXX cut from bge_blockinit(): */ @ + @ + /* Disable host coalescing until we get it set up */ @ + CSR_WRITE_4(sc, BGE_HCC_MODE, 0x00000000); @ + @ + /* Poll to make sure it's shut down. */ @ + for (i = 0; i < BGE_TIMEOUT; i++) { @ + if (!(CSR_READ_4(sc, BGE_HCC_MODE) & BGE_HCCMODE_ENABLE)) @ + break; @ + DELAY(10); @ + } @ + @ + if (i == BGE_TIMEOUT) { @ + device_printf(sc->bge_dev, @ + "host coalescing engine failed to idle\n"); @ + CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE); @ + BGE_UNLOCK(sc); @ + return (ENXIO); @ + } @ + @ + /* Set up host coalescing defaults */ @ + if (sc->bge_dyncoal_max_intr_freq != 0) @ + sc->bge_dyncoal_scale = ((uint64_t)1 << 24) / @ + sc->bge_dyncoal_max_intr_freq; @ + CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks); @ + CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks); @ + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, sc->bge_rx_max_coal_bds); @ + CSR_WRITE_4(sc, BGE_HCC_TX_MAX_COAL_BDS, sc->bge_tx_max_coal_bds); @ + @ + /* Turn on host coalescing state machine */ @ + CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE); @ + @ + BGE_UNLOCK(sc); @ + return (0); @ + } @ + @ + static int @ bge_attach(device_t dev) @ { @ *************** @ *** 2584,2591 **** @ /* Set default tuneable values. */ @ sc->bge_stat_ticks = BGE_TICKS_PER_SEC; @ ! sc->bge_rx_coal_ticks = 150; @ ! sc->bge_tx_coal_ticks = 150; @ ! sc->bge_rx_max_coal_bds = 10; @ ! sc->bge_tx_max_coal_bds = 10; @ @ /* Set up ifnet structure */ @ --- 2645,2652 ---- @ /* Set default tuneable values. */ @ sc->bge_stat_ticks = BGE_TICKS_PER_SEC; @ ! sc->bge_dyncoal_max_intr_freq = 10000; @ ! sc->bge_tx_coal_ticks = 1000000; @ ! sc->bge_rx_max_coal_bds = 128; @ ! sc->bge_tx_max_coal_bds = BGE_TX_RING_CNT * 3 / 4; @ @ /* Set up ifnet structure */ @ *************** @ *** 3331,3343 **** @ if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 && @ sc->bge_chipid != BGE_CHIPID_BCM5700_B2) || @ ! statusword || sc->bge_link_evt) @ bge_link_upd(sc); @ @ if (ifp->if_drv_flags & IFF_DRV_RUNNING) { @ - /* Check RX return ring producer/consumer. */ @ bge_rxeof(sc); @ - @ - /* Check TX ring producer/consumer. */ @ bge_txeof(sc); @ } @ @ --- 3583,3638 ---- @ if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 && @ sc->bge_chipid != BGE_CHIPID_BCM5700_B2) || @ ! (macstatus & BGE_MACSTAT_LINK_CHANGED) || sc->bge_link_evt) @ bge_link_upd(sc); @ @ if (ifp->if_drv_flags & IFF_DRV_RUNNING) { @ bge_rxeof(sc); @ bge_txeof(sc); @ + if (sc->bge_dyncoal_max_intr_freq != 0 && @ + ++sc->bge_dyncoal_intrcnt == 16) { @ + struct bintime bt; @ + uint32_t dpi, pfrac, tfrac, xtime; @ + @ + binuptime(&bt); @ + xtime = (bt.sec << 24) | (bt.frac >> 40); @ + pfrac = (ifp->if_ipackets - sc->bge_dyncoal_ipackets) * @ + sc->bge_dyncoal_scale; @ + tfrac = xtime - sc->bge_dyncoal_xtime; @ + sc->bge_dyncoal_rx_pps = @ + (ifp->if_ipackets - sc->bge_dyncoal_ipackets) * @ + ((uint64_t)1 << 24) / tfrac; @ + dpi = pfrac / (tfrac | 2) + 1; @ + if (dpi > sc->bge_rx_max_coal_bds) @ + dpi = sc->bge_rx_max_coal_bds; @ + if (dpi != sc->bge_dyncoal_rx_max_coal_bds) { @ + if (bge_careful_coal) { @ + CSR_WRITE_4(sc, BGE_HCC_MODE, 0); @ + CSR_READ_4(sc, BGE_HCC_MODE); @ + if ((CSR_READ_4(sc, BGE_HCC_MODE) & @ + BGE_HCCMODE_ENABLE) == 0) { @ + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, @ + dpi); @ + sc->bge_dyncoal_rx_max_coal_bds = dpi; @ + bge_coal_writes++; @ + } else @ + bge_coal_write_fails++; @ + CSR_WRITE_4(sc, BGE_HCC_MODE, @ + BGE_HCCMODE_ENABLE); @ + } else { @ + /* @ + * XXX not waiting for the engine is needed @ + * for efficiency since we reprogram it a @ + * lot so as to react fast, and this seems @ + * to work. However, similar reprogramming @ + * of RX_COAL_TICKS doesn't work. @ + */ @ + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, dpi); @ + sc->bge_dyncoal_rx_max_coal_bds = dpi; @ + } @ + } @ + sc->bge_dyncoal_xtime = xtime; @ + sc->bge_dyncoal_intrcnt = 0; @ + sc->bge_dyncoal_ipackets = ifp->if_ipackets; @ + } @ } @ @ *************** @ *** 4450,4453 **** @ --- 4756,4782 ---- @ children = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->bge_dev)); @ @ + SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "program_coal", @ + CTLTYPE_INT | CTLFLAG_RW, @ + sc, 0, bge_sysctl_program_coal, "I", @ + "program bge coalescence values"); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_coal_ticks", CTLFLAG_RW, @ + &sc->bge_rx_coal_ticks, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_coal_ticks", CTLFLAG_RW, @ + &sc->bge_tx_coal_ticks, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_max_coal_bds", CTLFLAG_RW, @ + &sc->bge_rx_max_coal_bds, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_max_coal_bds", CTLFLAG_RW, @ + &sc->bge_tx_max_coal_bds, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_max_intr_freq", @ + CTLFLAG_RW, @ + &sc->bge_dyncoal_max_intr_freq, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_max_coal_bds", @ + CTLFLAG_RD, @ + &sc->bge_dyncoal_rx_max_coal_bds, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_pps", CTLFLAG_RD, @ + &sc->bge_dyncoal_rx_pps, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_scale", CTLFLAG_RD, @ + &sc->bge_dyncoal_scale, 0, ""); @ + @ #ifdef BGE_REGISTER_DEBUG @ SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "debug_info", @ diff -c2 ./dev/bge/if_bgereg.h~ ./dev/bge/if_bgereg.h @ *** ./dev/bge/if_bgereg.h~ Fri May 16 16:39:21 2008 @ --- ./dev/bge/if_bgereg.h Fri May 16 16:39:23 2008 @ *************** @ *** 2579,2582 **** @ --- 2573,2583 ---- @ uint32_t bge_tx_discards; @ uint32_t bge_tx_collisions; @ + int bge_dyncoal_intrcnt; @ + u_long bge_dyncoal_ipackets; @ + uint32_t bge_dyncoal_max_intr_freq; @ + uint32_t bge_dyncoal_rx_max_coal_bds; @ + uint32_t bge_dyncoal_rx_pps; @ + uint32_t bge_dyncoal_scale; @ + uint32_t bge_dyncoal_xtime; @ #ifdef DEVICE_POLLING @ int rxcycles; The dynamical coal programming hasn't been mentioned recently, but really, it is the only useful thing in the above. Once the system does correct dynamic coal programming, there is no need for static sysctls except to reprove that they can't do such a good job as dynamic programming. I don't bother with dynamic coal program for tx. It would be possible to switch to more conservative tx settings under heavy rx load, and if you know too much about the device internals then balance all the resource uses so that they compete with each other as little as possible, but the possible gains from that are small. Bruce