From owner-svn-src-all@FreeBSD.ORG Mon Dec 1 14:48:39 2014 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B472EC0F; Mon, 1 Dec 2014 14:48:39 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 89F74883; Mon, 1 Dec 2014 14:48:39 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net [173.70.85.31]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 04DB7B923; Mon, 1 Dec 2014 09:48:38 -0500 (EST) From: John Baldwin To: Alfred Perlstein Subject: Re: svn commit: r275136 - in head/sys: dev/e1000 dev/ixgbe kern sys Date: Mon, 01 Dec 2014 09:32:42 -0500 Message-ID: <39377603.10OyiSzjWY@ralph.baldwin.cx> User-Agent: KMail/4.14.2 (FreeBSD/10.1-STABLE; KDE/4.14.2; amd64; ; ) In-Reply-To: <201411262019.sAQKJaw4043557@svn.freebsd.org> References: <201411262019.sAQKJaw4043557@svn.freebsd.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 01 Dec 2014 09:48:38 -0500 (EST) Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Dec 2014 14:48:39 -0000 On Wednesday, November 26, 2014 08:19:36 PM Alfred Perlstein wrote: > Author: alfred > Date: Wed Nov 26 20:19:36 2014 > New Revision: 275136 > URL: https://svnweb.freebsd.org/changeset/base/275136 > > Log: > Make igb and ixgbe check tunables at probe time. > > This allows one to make a kernel module to tune the > number of queues before the driver loads. > > This is needed so that a module at SI_SUB_CPU can set > tunables for these drivers to take. Otherwise getenv > is called too early by the TUNABLE macros. > > Reviewed by: smh > Phabric: https://reviews.freebsd.org/D1149 The explicit TUNABLE_INT_FETCH strikes me as the wrong approach and hackish. That is, each time you want to frob a tunable like this you will have to go add a bunch of code to explicitly re-read the tunable, etc. Instead, you could have simply changed your helper module to use kernel_sysctlbyname() to set hw.igb.num_queues. That would have required only a single character change to make the SYSCTL read/write instead of read/only for each tunable in question. To be completely future proof (i.e. to handle loading the module in question post-boot), your module could both do setenv() and kernel_sysctlbyname(). That seems to be a more extensible approach in terms of allowing more of these to be changed in the future without having to do a manual bypass of the existing tunable infrastructure for each tunable. I would much prefer that you revert this part and change the relevant tunables to CTLFLAG_RWTUN and update your out-of-tree module to use kernel_sysctlbyname(). Also, if you didn't run this by the Intel folks (e.g. jfv@) that might have been nice as a courtesy. Also, please use the existing resource_int_value() that uses 'hint.igb.0.foo' instead of inventing a different wheel (device_get_int()). This is what all the other drivers in the tree that provide per-instance tunables use. You could perhaps have device_get_int() as a wrapper around resource_int_value(), but we should use the existing convention, not invent a conflicting one. > Modified: > head/sys/dev/e1000/if_igb.c > head/sys/dev/ixgbe/ixgbe.c > head/sys/kern/subr_bus.c > head/sys/sys/bus.h > > Modified: head/sys/dev/e1000/if_igb.c > ============================================================================ > == --- head/sys/dev/e1000/if_igb.c Wed Nov 26 18:03:25 2014 (r275135) +++ > head/sys/dev/e1000/if_igb.c Wed Nov 26 20:19:36 2014 (r275136) @@ -188,6 > +188,7 @@ static char *igb_strings[] = { > /********************************************************************* > * Function prototypes > *********************************************************************/ > +static int igb_per_unit_num_queues(SYSCTL_HANDLER_ARGS); > static int igb_probe(device_t); > static int igb_attach(device_t); > static int igb_detach(device_t); > @@ -493,6 +494,11 @@ igb_attach(device_t dev) > OID_AUTO, "nvm", CTLTYPE_INT|CTLFLAG_RW, adapter, 0, > igb_sysctl_nvm_info, "I", "NVM Information"); > > + SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev), > + SYSCTL_CHILDREN(device_get_sysctl_tree(dev)), > + OID_AUTO, "num_queues", CTLTYPE_INT | CTLFLAG_RD, > + adapter, 0, igb_per_unit_num_queues, "I", "Number of Queues"); > + You don't need igb_per_unit_num_queues(). SYSCTL_ADD_INT(..., &adapter->num_queues) should have been used instead. > @@ -2831,6 +2837,7 @@ igb_setup_msix(struct adapter *adapter) > { > device_t dev = adapter->dev; > int bar, want, queues, msgs, maxqueues; > + int n_queues; > > /* tuneable override */ > if (igb_enable_msix == 0) > @@ -2858,8 +2865,18 @@ igb_setup_msix(struct adapter *adapter) > goto msi; > } > > - /* Figure out a reasonable auto config value */ > - queues = (mp_ncpus > (msgs-1)) ? (msgs-1) : mp_ncpus; > + n_queues = 0; > + /* try more specific tunable, then global, then finally default to boot > time tunable if set. */ + if (device_getenv_int(dev, "num_queues", > &n_queues) != 0) { > + device_printf(dev, "using specific tunable num_queues=%d", n_queues); > + } else if (TUNABLE_INT_FETCH("hw.igb.num_queues", &n_queues) != 0) { > + if (igb_num_queues != n_queues) { > + device_printf(dev, "using global tunable hw.igb.num_queues=%d", > n_queues); + igb_num_queues = n_queues; > + } > + } else { > + n_queues = igb_num_queues; > + } > > #ifdef RSS > /* If we're doing RSS, clamp at the number of RSS buckets */ > @@ -2867,10 +2884,12 @@ igb_setup_msix(struct adapter *adapter) > queues = rss_getnumbuckets(); > #endif > > - > - /* Manual override */ > - if (igb_num_queues != 0) > - queues = igb_num_queues; > + if (n_queues != 0) { > + queues = n_queues; > + } else { > + /* Figure out a reasonable auto config value */ > + queues = (mp_ncpus > (msgs-1)) ? (msgs-1) : mp_ncpus; > + } This moves the mp_ncpus auto config below the RSS bits, so now you are ignoring the RSS value. You probably should have left the mp_ncpus line where it was. One old bug here is that igb checked the tunable value twice. It should only check it once at the end of all the auto-tuning. > /* Sanity check based on HW */ > switch (adapter->hw.mac.type) { > @@ -2893,12 +2912,17 @@ igb_setup_msix(struct adapter *adapter) > maxqueues = 1; > break; > } > - if (queues > maxqueues) > + if (queues > maxqueues) { > + device_printf(adapter->dev, "requested %d queues, but max for this > adapter is %d\n", + queues, maxqueues); > queues = maxqueues; > - > - /* Manual override */ > - if (igb_num_queues != 0) > - queues = igb_num_queues; > + } else if (queues == 0) { > + queues = 1; > + } else if (queues < 0) { > + device_printf(adapter->dev, "requested %d queues, but min for this > adapter is %d\n", + queues, 1); > + queues = 1; > + } So here is where I would check the tunable to allow it to override. Assuming you revert the TUNABLE_INT_FETCH business then this only needs: /* Manual override */ if (device_get_int(&n_queues) != 0) n_queues = igb_num_queues; if (n_queues != 0) { if (n_queues > maxqueues || n_queues < 0) /* whine */ else queues = n_queues; } -- John Baldwin