Date: Thu, 9 Oct 2014 21:53:52 -0600 From: Warner Losh <imp@bsdimp.com> To: Adrian Chadd <adrian@FreeBSD.org> Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: [rfc] enumerating device / bus domain information Message-ID: <838B58B2-22D6-4AA4-97D5-62E87101F234@bsdimp.com> In-Reply-To: <CAJ-VmonbGW1JbEiKXJ0sQCFr0%2BCRphVrSuBhFnh1gq6-X1CFdQ@mail.gmail.com> References: <CAJ-VmokF7Ey0fxaQ7EMBJpCbgFnyOteiL2497Z4AFovc%2BQRkTA@mail.gmail.com> <2975E3D3-0335-4739-9242-5733CCEE726C@bsdimp.com> <CAJ-VmonbGW1JbEiKXJ0sQCFr0%2BCRphVrSuBhFnh1gq6-X1CFdQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_5C657A39-8CEF-4768-80C7-AD7E7A5071B4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Oct 8, 2014, at 5:12 PM, Adrian Chadd <adrian@FreeBSD.org> wrote: > On 8 October 2014 12:07, Warner Losh <imp@bsdimp.com> wrote: >>=20 >> On Oct 7, 2014, at 7:37 PM, Adrian Chadd <adrian@FreeBSD.org> wrote: >>=20 >>> Hi, >>>=20 >>> Right now we're not enumerating any NUMA domain information about = devices. >>>=20 >>> The more recent intel NUMA stuff has some extra affinity information >>> for devices that (eventually) will allow us to bind kernel/user >>> threads and/or memory allocation to devices to keep access local. >>> There's a penalty for DMAing in/out of remote memory, so we'll want = to >>> figure out what counts as "Local" for memory allocation and perhaps >>> constrain the CPU set that worker threads for a device run on. >>>=20 >>> This patch adds a few things: >>>=20 >>> * it adds a bus_if.m method for fetching the VM domain ID of a given >>> device; or ENOENT if it's not in a VM domain; >>=20 >> Maybe a default VM domain. All devices are in VM domains :) By = default >> today, we have only one VM domain, and that=92s the model that most = of the >> code expects=85 >=20 > Right, and that doesn't change until you compile in with num domains > = 1. The first part of the statement doesn=92t change when the number of = domains is more than one. All devices are in a VM domain. > Then, CPUs and memory have VM domains, but devices may or may not have > a VM domain. There's no "default" VM domain defined if num domains > > 1. Please explain how a device cannot have a VM domain? For the terminology I'm familiar with, to even get cycles to the device, you = have to have a memory address (or an I/O port). That memory address has to necessarily map to some domain, even if that domain is equally sucky to get to from all CPUs (as is the case with I/O ports). while there may not be a =93default=94 domain, by virtue of its physical location it has = to have one. > The devices themselves don't know about VM domains right now, so > there's nothing constraining things like IRQ routing, CPU set, memory > allocation, etc. The isilon team is working on extending the cpuset > and allocators to "know" about numa and I'm sure this stuff will fall > out of whatever they're working on. Why would the device need to know the domain? Why aren=92t the IRQs, for example, steered to the appropriate CPU? Why doesn=92t the bus = handle allocating memory for it in the appropriate place? How does this = =93domain=94 tie into memory allocation and thread creation? > So when I go to add sysctl and other tree knowledge for device -> vm > domain mapping I'm going to make them return -1 for "no domain.=94 Seems like there=92s too many things lumped together here. First off, = how can there be no domain. That just hurts my brain. It has to be in some domain, or it can=92t be seen. Maybe this domain is one that sucks for = everybody to access, maybe it is one that=92s fast for some CPU or package of CPUs = to access, but it has to have a domain. > (Things will get pretty hilarious later on if we have devices that are > "local" to two or more VM domains ..) Well, devices aren=92t local to domains, per se. Devices can communicate = with other components in a system at a given cost. One NUMA model is =93near=94= vs =93far=94 where a single near domain exists and all the =93far=94 resources are = quite costly. Other NUMA models may have a wider range of costs so that some resources are = cheap, others are a little less cheap, while others are down right expensive = depending on how far across the fabric of interconnects the messages need to = travel. While one can model this as a full 1-1 partitioning, that doesn=92t match all = of the extant implementations, even today. It is easy, but an imperfect match to the = underlying realities in many cases (though a very good match to x86, which is = mostly what we care about). Warner --Apple-Mail=_5C657A39-8CEF-4768-80C7-AD7E7A5071B4 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUN1hQAAoJEGwc0Sh9sBEAZ+EP/18pDCCN8iON0ziWDSFutha8 eLm/2Z3Me32wGm+uiv6wXMvoCsu9oqpi8ULwheQIEZf6Ieh9RaCacIXeEzlAjO8u 1zEaVv6qXALkv8IEhtfbaesFElcnFCbAdYJG90GnmaFXdE0N9Z7oV/6C7M4nuIYq 82OgeziQ5UMAc8LPQxZyk2aDaHT7SrtB/A2Y+e+KBfiWgcHFjoiEQwlB4TT1gFC+ ycYJGlfkaEFmspilymVRUWSJkqhVSJFkn+0v6KMOtUCpxMvVDcIWyIUxAtg/wYt7 qnR+JDKYiS7fa5UGqfUDEZtJ2p2D10l4ziMelAOasUWfFtgi+2HDLP4GfBnvGQdq lu7cE1FPGsHNxMwuTi9nVegImYj8rJ4Uiec0kq1rIV1mukQS2V3vFADR/BSGViSr 7SZ2NFEf7CJND2246jxTaXoF4bKbYJilohd82FV3S1yAnj/UEONElbbDzMwfpIuS oWKFfF/ywau8A+qNp0EI6GjBDxLAmjK1cepSlDcTraQrrLgf6bUnTGhZYiujYk0p gGJtmkU+DMknKJFN5MouOTFpPHG7+KGvvbgpN5D9MuTqmYhqvDmuV+dhfRyi9zoT DAp7K5SuubwfuThUV8yjEAllE5Fv5q8wizCesZDZ1nRYTLmC8Z5EMbmk1lYVBmek YivD8gbWK1DE1cpLPBHy =Hphq -----END PGP SIGNATURE----- --Apple-Mail=_5C657A39-8CEF-4768-80C7-AD7E7A5071B4--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?838B58B2-22D6-4AA4-97D5-62E87101F234>