FreeBSD Mail Archives

Date:      Thu, 9 Oct 2014 21:53:52 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Adrian Chadd <adrian@FreeBSD.org>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: [rfc] enumerating device / bus domain information
Message-ID:  <838B58B2-22D6-4AA4-97D5-62E87101F234@bsdimp.com>
In-Reply-To: <CAJ-VmonbGW1JbEiKXJ0sQCFr0%2BCRphVrSuBhFnh1gq6-X1CFdQ@mail.gmail.com>
References:  <CAJ-VmokF7Ey0fxaQ7EMBJpCbgFnyOteiL2497Z4AFovc%2BQRkTA@mail.gmail.com> <2975E3D3-0335-4739-9242-5733CCEE726C@bsdimp.com> <CAJ-VmonbGW1JbEiKXJ0sQCFr0%2BCRphVrSuBhFnh1gq6-X1CFdQ@mail.gmail.com>

[-- Attachment #1 --]

On Oct 8, 2014, at 5:12 PM, Adrian Chadd <adrian@FreeBSD.org> wrote:

> On 8 October 2014 12:07, Warner Losh <imp@bsdimp.com> wrote:
>> 
>> On Oct 7, 2014, at 7:37 PM, Adrian Chadd <adrian@FreeBSD.org> wrote:
>> 
>>> Hi,
>>> 
>>> Right now we're not enumerating any NUMA domain information about devices.
>>> 
>>> The more recent intel NUMA stuff has some extra affinity information
>>> for devices that (eventually) will allow us to bind kernel/user
>>> threads and/or memory allocation to devices to keep access local.
>>> There's a penalty for DMAing in/out of remote memory, so we'll want to
>>> figure out what counts as "Local" for memory allocation and perhaps
>>> constrain the CPU set that worker threads for a device run on.
>>> 
>>> This patch adds a few things:
>>> 
>>> * it adds a bus_if.m method for fetching the VM domain ID of a given
>>> device; or ENOENT if it's not in a VM domain;
>> 
>> Maybe a default VM domain. All devices are in VM domains :) By default
>> today, we have only one VM domain, and that�s the model that most of the
>> code expects�
> 
> Right, and that doesn't change until you compile in with num domains > 1.

The first part of the statement doesn�t change when the number of domains
is more than one. All devices are in a VM domain.

> Then, CPUs and memory have VM domains, but devices may or may not have
> a VM domain. There's no "default" VM domain defined if num domains >
> 1.

Please explain how a device cannot have a VM domain? For the
terminology I'm familiar with, to even get cycles to the device, you have to
have a memory address (or an I/O port). That memory address has to
necessarily map to some domain, even if that domain is equally sucky
to get to from all CPUs (as is the case with I/O ports). while there may
not be a �default� domain, by virtue of its physical location it has to have
one.

> The devices themselves don't know about VM domains right now, so
> there's nothing constraining things like IRQ routing, CPU set, memory
> allocation, etc. The isilon team is working on extending the cpuset
> and allocators to "know" about numa and I'm sure this stuff will fall
> out of whatever they're working on.

Why would the device need to know the domain? Why aren�t the IRQs,
for example, steered to the appropriate CPU? Why doesn�t the bus handle
allocating memory for it in the appropriate place? How does this �domain� tie
into memory allocation and thread creation?

> So when I go to add sysctl and other tree knowledge for device -> vm
> domain mapping I'm going to make them return -1 for "no domain.�

Seems like there�s too many things lumped together here. First off, how
can there be no domain. That just hurts my brain. It has to be in some
domain, or it can�t be seen. Maybe this domain is one that sucks for everybody
to access, maybe it is one that�s fast for some CPU or package of CPUs to
access, but it has to have a domain.

> (Things will get pretty hilarious later on if we have devices that are
> "local" to two or more VM domains ..)

Well, devices aren�t local to domains, per se. Devices can communicate with
other components in a system at a given cost. One NUMA model is �near� vs �far�
where a single near domain exists and all the �far� resources are quite costly. Other
NUMA models may have a wider range of costs so that some resources are cheap,
others are a little less cheap, while others are down right expensive depending
on how far across the fabric of interconnects the messages need to travel. While
one can model this as a full 1-1 partitioning, that doesn�t match all of the extant
implementations, even today. It is easy, but an imperfect match to the underlying
realities in many cases (though a very good match to x86, which is mostly what
we care about).

Warner

[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJUN1hQAAoJEGwc0Sh9sBEAZ+EP/18pDCCN8iON0ziWDSFutha8
eLm/2Z3Me32wGm+uiv6wXMvoCsu9oqpi8ULwheQIEZf6Ieh9RaCacIXeEzlAjO8u
1zEaVv6qXALkv8IEhtfbaesFElcnFCbAdYJG90GnmaFXdE0N9Z7oV/6C7M4nuIYq
82OgeziQ5UMAc8LPQxZyk2aDaHT7SrtB/A2Y+e+KBfiWgcHFjoiEQwlB4TT1gFC+
ycYJGlfkaEFmspilymVRUWSJkqhVSJFkn+0v6KMOtUCpxMvVDcIWyIUxAtg/wYt7
qnR+JDKYiS7fa5UGqfUDEZtJ2p2D10l4ziMelAOasUWfFtgi+2HDLP4GfBnvGQdq
lu7cE1FPGsHNxMwuTi9nVegImYj8rJ4Uiec0kq1rIV1mukQS2V3vFADR/BSGViSr
7SZ2NFEf7CJND2246jxTaXoF4bKbYJilohd82FV3S1yAnj/UEONElbbDzMwfpIuS
oWKFfF/ywau8A+qNp0EI6GjBDxLAmjK1cepSlDcTraQrrLgf6bUnTGhZYiujYk0p
gGJtmkU+DMknKJFN5MouOTFpPHG7+KGvvbgpN5D9MuTqmYhqvDmuV+dhfRyi9zoT
DAp7K5SuubwfuThUV8yjEAllE5Fv5q8wizCesZDZ1nRYTLmC8Z5EMbmk1lYVBmek
YivD8gbWK1DE1cpLPBHy
=Hphq
-----END PGP SIGNATURE-----

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?838B58B2-22D6-4AA4-97D5-62E87101F234>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation