From owner-freebsd-arch@FreeBSD.ORG Fri Oct 10 03:54:02 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 014667F2 for ; Fri, 10 Oct 2014 03:54:01 +0000 (UTC) Received: from mail-pd0-f172.google.com (mail-pd0-f172.google.com [209.85.192.172]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C28EDA3D for ; Fri, 10 Oct 2014 03:54:01 +0000 (UTC) Received: by mail-pd0-f172.google.com with SMTP id ft15so888422pdb.31 for ; Thu, 09 Oct 2014 20:53:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=mIs+v0GLFXUFvNbSNIsD2n2OEJPJOVdnXiZcsH96ldQ=; b=GN0SZtUkx4KNxYpd73jhd+U19RHlzGSAOjj2rpjQv49UMpwNwEOa2GSoKRHj4vqrDj M2tMYZ71W5/iQgl6QY8Twprs4H5sx7hef//tGZCpkp/7Ih+1DGYsuQCCtzvCvKO8wZ4t bjEqFYUGqiNnqL2XOcYeCulIi9AE1+n0LcmjSKheAZ0FEpCT0S1ABG9rNUX3VvkPFWBw SYpuI0FvJAnKccoca/x5yz6S71vzamvGsUdBsp0bmO8hbvvLc/7u+7UMPfSqYmhaB0+/ SgbssJRKYmUDweCQcowidrnpUzdCVlWQAfs3jaoBVYOVaGyq0FLKykreFYQPk8F8Kvyh gH3A== X-Gm-Message-State: ALoCoQmiRh0MSfnoCWiIr+1Ks0ih0xFI6tkKE979588HlVAw5sNYjPa8+vyu5CTp8l2Ft2nd/mtp X-Received: by 10.68.219.10 with SMTP id pk10mr2460211pbc.14.1412913235572; Thu, 09 Oct 2014 20:53:55 -0700 (PDT) Received: from [10.64.27.107] (dc1-prod.netflix.com. [69.53.236.251]) by mx.google.com with ESMTPSA id vf10sm1974695pbc.11.2014.10.09.20.53.54 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 09 Oct 2014 20:53:54 -0700 (PDT) Sender: Warner Losh Content-Type: multipart/signed; boundary="Apple-Mail=_5C657A39-8CEF-4768-80C7-AD7E7A5071B4"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: [rfc] enumerating device / bus domain information From: Warner Losh In-Reply-To: Date: Thu, 9 Oct 2014 21:53:52 -0600 Message-Id: <838B58B2-22D6-4AA4-97D5-62E87101F234@bsdimp.com> References: <2975E3D3-0335-4739-9242-5733CCEE726C@bsdimp.com> To: Adrian Chadd X-Mailer: Apple Mail (2.1878.6) Cc: "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Oct 2014 03:54:02 -0000 --Apple-Mail=_5C657A39-8CEF-4768-80C7-AD7E7A5071B4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Oct 8, 2014, at 5:12 PM, Adrian Chadd wrote: > On 8 October 2014 12:07, Warner Losh wrote: >>=20 >> On Oct 7, 2014, at 7:37 PM, Adrian Chadd wrote: >>=20 >>> Hi, >>>=20 >>> Right now we're not enumerating any NUMA domain information about = devices. >>>=20 >>> The more recent intel NUMA stuff has some extra affinity information >>> for devices that (eventually) will allow us to bind kernel/user >>> threads and/or memory allocation to devices to keep access local. >>> There's a penalty for DMAing in/out of remote memory, so we'll want = to >>> figure out what counts as "Local" for memory allocation and perhaps >>> constrain the CPU set that worker threads for a device run on. >>>=20 >>> This patch adds a few things: >>>=20 >>> * it adds a bus_if.m method for fetching the VM domain ID of a given >>> device; or ENOENT if it's not in a VM domain; >>=20 >> Maybe a default VM domain. All devices are in VM domains :) By = default >> today, we have only one VM domain, and that=92s the model that most = of the >> code expects=85 >=20 > Right, and that doesn't change until you compile in with num domains > = 1. The first part of the statement doesn=92t change when the number of = domains is more than one. All devices are in a VM domain. > Then, CPUs and memory have VM domains, but devices may or may not have > a VM domain. There's no "default" VM domain defined if num domains > > 1. Please explain how a device cannot have a VM domain? For the terminology I'm familiar with, to even get cycles to the device, you = have to have a memory address (or an I/O port). That memory address has to necessarily map to some domain, even if that domain is equally sucky to get to from all CPUs (as is the case with I/O ports). while there may not be a =93default=94 domain, by virtue of its physical location it has = to have one. > The devices themselves don't know about VM domains right now, so > there's nothing constraining things like IRQ routing, CPU set, memory > allocation, etc. The isilon team is working on extending the cpuset > and allocators to "know" about numa and I'm sure this stuff will fall > out of whatever they're working on. Why would the device need to know the domain? Why aren=92t the IRQs, for example, steered to the appropriate CPU? Why doesn=92t the bus = handle allocating memory for it in the appropriate place? How does this = =93domain=94 tie into memory allocation and thread creation? > So when I go to add sysctl and other tree knowledge for device -> vm > domain mapping I'm going to make them return -1 for "no domain.=94 Seems like there=92s too many things lumped together here. First off, = how can there be no domain. That just hurts my brain. It has to be in some domain, or it can=92t be seen. Maybe this domain is one that sucks for = everybody to access, maybe it is one that=92s fast for some CPU or package of CPUs = to access, but it has to have a domain. > (Things will get pretty hilarious later on if we have devices that are > "local" to two or more VM domains ..) Well, devices aren=92t local to domains, per se. Devices can communicate = with other components in a system at a given cost. One NUMA model is =93near=94= vs =93far=94 where a single near domain exists and all the =93far=94 resources are = quite costly. Other NUMA models may have a wider range of costs so that some resources are = cheap, others are a little less cheap, while others are down right expensive = depending on how far across the fabric of interconnects the messages need to = travel. While one can model this as a full 1-1 partitioning, that doesn=92t match all = of the extant implementations, even today. It is easy, but an imperfect match to the = underlying realities in many cases (though a very good match to x86, which is = mostly what we care about). Warner --Apple-Mail=_5C657A39-8CEF-4768-80C7-AD7E7A5071B4 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUN1hQAAoJEGwc0Sh9sBEAZ+EP/18pDCCN8iON0ziWDSFutha8 eLm/2Z3Me32wGm+uiv6wXMvoCsu9oqpi8ULwheQIEZf6Ieh9RaCacIXeEzlAjO8u 1zEaVv6qXALkv8IEhtfbaesFElcnFCbAdYJG90GnmaFXdE0N9Z7oV/6C7M4nuIYq 82OgeziQ5UMAc8LPQxZyk2aDaHT7SrtB/A2Y+e+KBfiWgcHFjoiEQwlB4TT1gFC+ ycYJGlfkaEFmspilymVRUWSJkqhVSJFkn+0v6KMOtUCpxMvVDcIWyIUxAtg/wYt7 qnR+JDKYiS7fa5UGqfUDEZtJ2p2D10l4ziMelAOasUWfFtgi+2HDLP4GfBnvGQdq lu7cE1FPGsHNxMwuTi9nVegImYj8rJ4Uiec0kq1rIV1mukQS2V3vFADR/BSGViSr 7SZ2NFEf7CJND2246jxTaXoF4bKbYJilohd82FV3S1yAnj/UEONElbbDzMwfpIuS oWKFfF/ywau8A+qNp0EI6GjBDxLAmjK1cepSlDcTraQrrLgf6bUnTGhZYiujYk0p gGJtmkU+DMknKJFN5MouOTFpPHG7+KGvvbgpN5D9MuTqmYhqvDmuV+dhfRyi9zoT DAp7K5SuubwfuThUV8yjEAllE5Fv5q8wizCesZDZ1nRYTLmC8Z5EMbmk1lYVBmek YivD8gbWK1DE1cpLPBHy =Hphq -----END PGP SIGNATURE----- --Apple-Mail=_5C657A39-8CEF-4768-80C7-AD7E7A5071B4--