Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Feb 2015 20:49:34 +0300
From:      Slawa Olhovchenkov <slw@zxy.spb.ru>
To:        John Baldwin <jhb@freebsd.org>
Cc:        arch@freebsd.org
Subject:   Re: RFC: bus_get_cpus(9)
Message-ID:  <20150219174934.GB46228@zxy.spb.ru>
In-Reply-To: <1848011.eGOHhpCEMm@ralph.baldwin.cx>
References:  <1848011.eGOHhpCEMm@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Feb 19, 2015 at 09:46:35AM -0500, John Baldwin wrote:

> One of the next steps for NUMA device-awareness is a way to let drivers know 
> which CPUs are ideal to use for interrupts (and in particular this is targeted 
> at multiqueue NICs that want to create a TX/RX ring pair per CPU).  However, 
> for modern Intel systems at least, it is usually best to use CPUs from the 
> physical processor package that contains the I/O hub that a device connects to 
> (e.g. to allow DDIO to work).
> 
> The PoC API I came up with is a new bus method called bus_get_cpus() that 
> returns a requested cpuset for a given device.  It accepts an enum for the 
> second parameter that says the type of cpuset being requested.  Currently two 
> valus are supported:
> 
>  - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the
>    device when NUMA is enabled)
>  - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
> 
> For a NIC driver the expectation is that the driver will call 
> 'bus_get_cpus(dev, INTR_CPUS, &set)' and create queues for each of the CPUs in 
> 'set'.  (In my current patchset I have updated igb(4) to use this approach.)
> 
> For systems that do not support NUMA (or if it is not enabled in the kernel 
> config), LOCAL_CPUS is mapped to 'all_cpus' by default in the 'root_bus' 
> driver.  INTR_CPUS is also mapped to 'all_cpus' by default.
> 
> The x86 interrupt code maintains its own set of interrupt CPUs which this 
> patch now exposes via INTR_CPUS in the x86 nexus driver.
> 
> The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable 
> LOCAL_CPUS set when _PXM exists and NUMA is enabled.  They also and the global 
> INTR_CPUS set from the nexus driver with the per-domain set from _PXM to 
> generate a local INTR_CPUS set for child devices.
> 
> The current patch can be found here:
> 
> https://github.com/bsdjhb/freebsd/compare/bsdjhb:master...numa_bus_get_cpus
> 
> It includes a few other fixes besides the implementation of bus_get_cpu() (and 
> some things have already been committed such as 
> taskqueue_start_threads_cpuset() and CPU_COUNT()):
> 
>  - It fixes the x86 interrupt code to exclude modern SMT threads from the
>    default interrupt set.  (Previously only Pentium 4-era HTT threads were
>    excluded.)
>  - It has a sample conversion of igb(4) to this interface (albeit ugly using
>    #if's).
> 
> Longer term I think I would like to make the INTR_CPUS thing a bit more 
> formal.  In particular, Solaris allows you to alter the set of CPUs that 
> handle interrupts via prctl (or a tool named something close to that).  I 
> think I would like to have a dedicated global cpuset for that (but not named 
> "2", it would be a new WHICH level).  That would allow userland to use cpuset 
> to alter the set of CPUs that handle interrupts in case you wanted to use SMT 
> for example.  I think if we do this that all ithreads would have their cpusets 
> hang off of this set instead of the root set (which would also remove some of 
> the recent special case handling for ithreads I believe).  The one uglier part 
> about this is that we should probably then have a way to notify drivers that 
> INTR_CPUS changed so that they could try to cope gracefully.  I think that's a 
> bit of a longer horizon thing, but for now I think bus_get_cpus() is a good 
> next step.
> 
> What do other folks think?  (And yes, I know it needs a manpage before it goes 
> in, but I'd rather get the API agreed on before polishing that.)

I am already use this way by manual using cpuset.
For some setups need dedicate one cpu set for interrupt handling and
other cpsu set for some application. Because application may be not
allow modification we need cpuset aware arithmetic, i.e. utility that
may answer like 'cpu set not used by interrupt handlers device ix0 and ix1'



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150219174934.GB46228>