Date: Thu, 9 Oct 2014 21:53:52 -0600 From: Warner Losh <imp@bsdimp.com> To: Adrian Chadd <adrian@FreeBSD.org> Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: [rfc] enumerating device / bus domain information Message-ID: <838B58B2-22D6-4AA4-97D5-62E87101F234@bsdimp.com> In-Reply-To: <CAJ-VmonbGW1JbEiKXJ0sQCFr0%2BCRphVrSuBhFnh1gq6-X1CFdQ@mail.gmail.com> References: <CAJ-VmokF7Ey0fxaQ7EMBJpCbgFnyOteiL2497Z4AFovc%2BQRkTA@mail.gmail.com> <2975E3D3-0335-4739-9242-5733CCEE726C@bsdimp.com> <CAJ-VmonbGW1JbEiKXJ0sQCFr0%2BCRphVrSuBhFnh1gq6-X1CFdQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] On Oct 8, 2014, at 5:12 PM, Adrian Chadd <adrian@FreeBSD.org> wrote: > On 8 October 2014 12:07, Warner Losh <imp@bsdimp.com> wrote: >> >> On Oct 7, 2014, at 7:37 PM, Adrian Chadd <adrian@FreeBSD.org> wrote: >> >>> Hi, >>> >>> Right now we're not enumerating any NUMA domain information about devices. >>> >>> The more recent intel NUMA stuff has some extra affinity information >>> for devices that (eventually) will allow us to bind kernel/user >>> threads and/or memory allocation to devices to keep access local. >>> There's a penalty for DMAing in/out of remote memory, so we'll want to >>> figure out what counts as "Local" for memory allocation and perhaps >>> constrain the CPU set that worker threads for a device run on. >>> >>> This patch adds a few things: >>> >>> * it adds a bus_if.m method for fetching the VM domain ID of a given >>> device; or ENOENT if it's not in a VM domain; >> >> Maybe a default VM domain. All devices are in VM domains :) By default >> today, we have only one VM domain, and that’s the model that most of the >> code expects… > > Right, and that doesn't change until you compile in with num domains > 1. The first part of the statement doesn’t change when the number of domains is more than one. All devices are in a VM domain. > Then, CPUs and memory have VM domains, but devices may or may not have > a VM domain. There's no "default" VM domain defined if num domains > > 1. Please explain how a device cannot have a VM domain? For the terminology I'm familiar with, to even get cycles to the device, you have to have a memory address (or an I/O port). That memory address has to necessarily map to some domain, even if that domain is equally sucky to get to from all CPUs (as is the case with I/O ports). while there may not be a “default” domain, by virtue of its physical location it has to have one. > The devices themselves don't know about VM domains right now, so > there's nothing constraining things like IRQ routing, CPU set, memory > allocation, etc. The isilon team is working on extending the cpuset > and allocators to "know" about numa and I'm sure this stuff will fall > out of whatever they're working on. Why would the device need to know the domain? Why aren’t the IRQs, for example, steered to the appropriate CPU? Why doesn’t the bus handle allocating memory for it in the appropriate place? How does this “domain” tie into memory allocation and thread creation? > So when I go to add sysctl and other tree knowledge for device -> vm > domain mapping I'm going to make them return -1 for "no domain.” Seems like there’s too many things lumped together here. First off, how can there be no domain. That just hurts my brain. It has to be in some domain, or it can’t be seen. Maybe this domain is one that sucks for everybody to access, maybe it is one that’s fast for some CPU or package of CPUs to access, but it has to have a domain. > (Things will get pretty hilarious later on if we have devices that are > "local" to two or more VM domains ..) Well, devices aren’t local to domains, per se. Devices can communicate with other components in a system at a given cost. One NUMA model is “near” vs “far” where a single near domain exists and all the “far” resources are quite costly. Other NUMA models may have a wider range of costs so that some resources are cheap, others are a little less cheap, while others are down right expensive depending on how far across the fabric of interconnects the messages need to travel. While one can model this as a full 1-1 partitioning, that doesn’t match all of the extant implementations, even today. It is easy, but an imperfect match to the underlying realities in many cases (though a very good match to x86, which is mostly what we care about). Warner [-- Attachment #2 --] -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUN1hQAAoJEGwc0Sh9sBEAZ+EP/18pDCCN8iON0ziWDSFutha8 eLm/2Z3Me32wGm+uiv6wXMvoCsu9oqpi8ULwheQIEZf6Ieh9RaCacIXeEzlAjO8u 1zEaVv6qXALkv8IEhtfbaesFElcnFCbAdYJG90GnmaFXdE0N9Z7oV/6C7M4nuIYq 82OgeziQ5UMAc8LPQxZyk2aDaHT7SrtB/A2Y+e+KBfiWgcHFjoiEQwlB4TT1gFC+ ycYJGlfkaEFmspilymVRUWSJkqhVSJFkn+0v6KMOtUCpxMvVDcIWyIUxAtg/wYt7 qnR+JDKYiS7fa5UGqfUDEZtJ2p2D10l4ziMelAOasUWfFtgi+2HDLP4GfBnvGQdq lu7cE1FPGsHNxMwuTi9nVegImYj8rJ4Uiec0kq1rIV1mukQS2V3vFADR/BSGViSr 7SZ2NFEf7CJND2246jxTaXoF4bKbYJilohd82FV3S1yAnj/UEONElbbDzMwfpIuS oWKFfF/ywau8A+qNp0EI6GjBDxLAmjK1cepSlDcTraQrrLgf6bUnTGhZYiujYk0p gGJtmkU+DMknKJFN5MouOTFpPHG7+KGvvbgpN5D9MuTqmYhqvDmuV+dhfRyi9zoT DAp7K5SuubwfuThUV8yjEAllE5Fv5q8wizCesZDZ1nRYTLmC8Z5EMbmk1lYVBmek YivD8gbWK1DE1cpLPBHy =Hphq -----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?838B58B2-22D6-4AA4-97D5-62E87101F234>
