From owner-freebsd-hackers@FreeBSD.ORG  Sat Jul 26 20:11:30 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 1276419D;
 Sat, 26 Jul 2014 20:11:30 +0000 (UTC)
Received: from mail-qa0-x22d.google.com (mail-qa0-x22d.google.com
 [IPv6:2607:f8b0:400d:c00::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B622C2DD8;
 Sat, 26 Jul 2014 20:11:29 +0000 (UTC)
Received: by mail-qa0-f45.google.com with SMTP id cm18so6023135qab.32
 for <multiple recipients>; Sat, 26 Jul 2014 13:11:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=4JE49dy2OEUKrMkKs1CyqStI/ULieHMO0dG7a+Egul0=;
 b=d2R5wbhrGsVW2PdYV0LoQPqqKcYbai4SMPl3uDBmVK/5LQYs1mFPTirRWJvGqWO05f
 BxWK9zKtvtM4SaWqr8O0b2ulL6j2rXW5Cx449YkwRyT+8w/p2Z+uHAdFarubV8piXRuo
 w1hBy4z0C5FjGldGHdA1UDNdQu5ycw2soPcfy0Vp0q5ljqkermQ2SHEVwdNjYtlME9B1
 HcOvqWbdj6ECsWaWaF/swZJ/TjtIpFf70gXIO06OXmLsYY5L0q8U27P5i7ahZlUci4PF
 QncpScUUHst/YNO2xyQeBjk6VkcbK2sV0LLqva0YGRzyKW4Z5qYhVC4kaTewqjYPxWzc
 vD7g==
MIME-Version: 1.0
X-Received: by 10.140.41.133 with SMTP id z5mr40879365qgz.99.1406405488898;
 Sat, 26 Jul 2014 13:11:28 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.224.1.6 with HTTP; Sat, 26 Jul 2014 13:11:28 -0700 (PDT)
In-Reply-To: <00E55D89-BDD1-41AD-BBF6-6752B90E8324@ccsys.com>
References: <CAPi5LmkRO4QLbR2JQV8FuT=jw2jjcCRbP8jT0kj1g8Ks+7jv8A@mail.gmail.com>
 <CAJ-VmonJPT-NUSi=Wnu7a0oNwe8V=LQMZ-fZGriC7H44edRVLg@mail.gmail.com>
 <CAPi5Lm=8Z3fh_vxKY26qC3oEv1Ap+RvFGRAOhRosF5UEnDTVpw@mail.gmail.com>
 <00E55D89-BDD1-41AD-BBF6-6752B90E8324@ccsys.com>
Date: Sat, 26 Jul 2014 13:11:28 -0700
X-Google-Sender-Auth: lyvebhsM_dz_fMAga7-AnHYAsx4
Message-ID: <CAJ-Vmom-wWZLCuuAEKDO1vuaGaSQM-=4e3xoh3OeVibc6m9Z8A@mail.gmail.com>
Subject: Re: Working on NUMA support
From: Adrian Chadd <adrian@freebsd.org>
To: Jeff Roberson <jeff@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Andrew Bates <andrewbates09@gmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 26 Jul 2014 20:11:30 -0000

Hi all!

Has there been any further progress on this?

I've been working on making the receive side scaling support usable by
mere mortals and I've reached a point where I'm going to need this
awareness in the 10ge/40ge drivers for the hardware I have access to.

I'm right now more interested in the kernel driver/allocator side of things, so:

* when bringing up a NIC, figure out what are the "most local" CPUs to run on;
* for each NIC queue, figure out what the "most local" bus resources
are for NIC resources like descriptors and packet memory (eg mbufs);
* for each NIC queue, figure out what the "most local" resources are
for local driver structures that the NIC doesn't touch (eg per-queue
state);
* for each RSS bucket, figure out what the "most local" resources are
for things like packet memory (mbufs), tcp/udp/inp control structures,
etc.

I had a chat with jhb yesterday and he reminded me that y'all at
isilon have been looking into this.

He described a few interesting cases from the kernel side to me.

* On architectures with external IO controllers, the path cost from an
IO device to multiple CPUs may be (almost) equivalent, so there's not
a huge penalty to allocate things on the wrong CPU. I think it'll be
nice to get CPU local affinity where possible so we can parallelise
DRAM access fully, but we can play with this and see.
* On architectures with CPU-integrated IO controllers, there's a large
penalty for doing inter-CPU IO,
* .. but there's not such a huge penalty for doing inter-CPU memory access.

Given that, we may find that we should always put the IO resources
local to the CPU it's attached to, even if we decide to run some / all
of the IO for the device on another CPU. Ie, any RAM that the IO
device is doing data or descriptor DMA into should be local to that
device. John said that in his experience it seemed the penalty for a
non-local CPU touching memory was much less than device DMA crossing
QPI.

So the tricky bit is figuring that out and expressing it all in a way
that allows us to do memory allocation and CPU binding in a more aware
way. The other half of this tricky thing is to allow it to be easily
overridden by a curious developer or system administrator that wants
to experiment with different policies.

Now, I'm very specifically only addressing the low level kernel IO /
memory allocation requirements here. There's other things to worry
about up in userland; I think you're trying to address that in your
KPI descriptions.

Thoughts?


-a