Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 01 Dec 2014 11:39:07 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        Alfred Perlstein <bright@mu.org>
Cc:        "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, Alfred Perlstein <alfred@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>
Subject:   Re: svn commit: r275136 - in head/sys: dev/e1000 dev/ixgbe kern sys
Message-ID:  <1558372.GUDICe5l9S@ralph.baldwin.cx>
In-Reply-To: <6753416A-D6FD-47F4-A62E-99A6ABB9B4B2@mu.org>
References:  <201411262019.sAQKJaw4043557@svn.freebsd.org> <1762770.yx1cv63jp7@ralph.baldwin.cx> <6753416A-D6FD-47F4-A62E-99A6ABB9B4B2@mu.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, December 01, 2014 07:55:57 AM Alfred Perlstein wrote:
> > On Dec 1, 2014, at 7:49 AM, John Baldwin <jhb@freebsd.org> wrote:
> >> On Monday, December 01, 2014 07:19:13 AM Alfred Perlstein wrote:
> >> John,
> >> 
> >> Will work on a new revision based on feedback.
> >> 
> >> Two things to note however:
> >> 
> >> Already explored the idea of using kernel_sysctlbyname but rejected due
> >> to
> >> following:
> >> 
> >> It makes little sense to have a rw sysctl that only takes effect "some
> >> times". This violates POLA at the expense of making code appear cleaner.
> >> Expectation is that writable sysctls take effect or are read only. They
> >> are
> >> not to be "write sometimes" unless we are to introduce a new flag.
> >> Instead
> >> of going to a confusing model we consider some form of rw sysctl that can
> >> set itself ro somehow. Otherwise people will be confused as to why nic
> >> queues says N while actually M.  What the rw->ro api would look like I
> >> have
> >> no idea. Suggestions?
> > 
> > This is only somewhat true.  In the near distant future we will have a
> > devctl tool which would let you do 'devctl detach igb0 && devctl attach
> > igb0' which would honor your post-boot setting of hw.igb.num_queues. 
> > Instead what is important to understand about this particular sysctl node
> > is that it only takes affect when a device is attached.  However, there
> > are other control knobs that also only affect future operations and not
> > existing instances of objects, so I don't think this is that big of a
> > leap.
> 
> Strongly disagree here.  If I were not able to grok the c code I would be
> very confused by such a thing. In fact even with the fact that I do grok c
> code I would be very discouraged to find such behavior and strongly object
> to writable sysctls that do nothing.

It's not hard to document this in the way that that sysctls are documented:

Index: if_igb.c
===================================================================
--- if_igb.c	(revision 275032)
+++ if_igb.c	(working copy)
@@ -390,7 +390,7 @@ SYSCTL_INT(_hw_igb, OID_AUTO, header_split, CTLFLA
 */
 static int igb_num_queues = 0;
 SYSCTL_INT(_hw_igb, OID_AUTO, num_queues, CTLFLAG_RDTUN, &igb_num_queues, 0,
-    "Number of queues to configure, 0 indicates autoconfigure");
+    "Number of queues to configure during attach, 0 indicates autoconfigure");
 
 /*
 ** Global variable to store last used CPU when binding queues

That is fairly clear from 'sysctl -D' (what I'd use first to discern what a
sysctl does).  The fact that the newly added 'dev.igb.0.num_queues' (which I
think is fine to add would be read-only) would make it clear you can't change
this at runtime on an existing interface.

On the other hand, I do think it makes sense to figure out what meta thing you
are trying to do.  I assume you simply want a way to disable multiqueue at
runtime and you don't want to do this in the loader?  (Given you are using an
in-house module for this, adding a change to the loader to set the tunable
might be even less work and would require no kernel changes at all, even for
future drivers for which you may want to set a tunable.)

The other option you have for your use case of course is to simply wait to load
igb.ko until after you've booted.  If you aren't using an NFS root over your
igb interfaces, then that seems far simpler than any of the suggested changes.
You can even do the work in userland with an rc.d script that uses kenv to set
tunables before you load igb.ko.  Actually, you can handle even NFS root for
this case.  Simply load igb from the loader using igb.ko.  You can then have
the SYSINIT in your special module run at SI_SUB_TUNABLES, SI_ORDER_FIRST.  It
won't actually run until closer to SI_SUB_KLD, but what happens is that after
SI_SUB_KLD runs linker_preload(), all the SYSINITs in the various modules
are sorted into the global list and then run in the resulting order.  So you
can insert a sysinit in your module that runs at SI_SUB_TUNABLES,
SI_ORDER_FIRST (the one you currently have at SI_SUB_CPU) and it will run
before igb.ko fetches any of its tunables.  This will work for any driver so
long as you ship it as a module without needing any code changes to any driver,
and works for any tunable.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1558372.GUDICe5l9S>