From owner-freebsd-hackers@FreeBSD.ORG Sat Jul 26 20:11:30 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1276419D; Sat, 26 Jul 2014 20:11:30 +0000 (UTC) Received: from mail-qa0-x22d.google.com (mail-qa0-x22d.google.com [IPv6:2607:f8b0:400d:c00::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B622C2DD8; Sat, 26 Jul 2014 20:11:29 +0000 (UTC) Received: by mail-qa0-f45.google.com with SMTP id cm18so6023135qab.32 for ; Sat, 26 Jul 2014 13:11:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=4JE49dy2OEUKrMkKs1CyqStI/ULieHMO0dG7a+Egul0=; b=d2R5wbhrGsVW2PdYV0LoQPqqKcYbai4SMPl3uDBmVK/5LQYs1mFPTirRWJvGqWO05f BxWK9zKtvtM4SaWqr8O0b2ulL6j2rXW5Cx449YkwRyT+8w/p2Z+uHAdFarubV8piXRuo w1hBy4z0C5FjGldGHdA1UDNdQu5ycw2soPcfy0Vp0q5ljqkermQ2SHEVwdNjYtlME9B1 HcOvqWbdj6ECsWaWaF/swZJ/TjtIpFf70gXIO06OXmLsYY5L0q8U27P5i7ahZlUci4PF QncpScUUHst/YNO2xyQeBjk6VkcbK2sV0LLqva0YGRzyKW4Z5qYhVC4kaTewqjYPxWzc vD7g== MIME-Version: 1.0 X-Received: by 10.140.41.133 with SMTP id z5mr40879365qgz.99.1406405488898; Sat, 26 Jul 2014 13:11:28 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.1.6 with HTTP; Sat, 26 Jul 2014 13:11:28 -0700 (PDT) In-Reply-To: <00E55D89-BDD1-41AD-BBF6-6752B90E8324@ccsys.com> References: <00E55D89-BDD1-41AD-BBF6-6752B90E8324@ccsys.com> Date: Sat, 26 Jul 2014 13:11:28 -0700 X-Google-Sender-Auth: lyvebhsM_dz_fMAga7-AnHYAsx4 Message-ID: Subject: Re: Working on NUMA support From: Adrian Chadd To: Jeff Roberson Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-hackers@freebsd.org" , Andrew Bates X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Jul 2014 20:11:30 -0000 Hi all! Has there been any further progress on this? I've been working on making the receive side scaling support usable by mere mortals and I've reached a point where I'm going to need this awareness in the 10ge/40ge drivers for the hardware I have access to. I'm right now more interested in the kernel driver/allocator side of things, so: * when bringing up a NIC, figure out what are the "most local" CPUs to run on; * for each NIC queue, figure out what the "most local" bus resources are for NIC resources like descriptors and packet memory (eg mbufs); * for each NIC queue, figure out what the "most local" resources are for local driver structures that the NIC doesn't touch (eg per-queue state); * for each RSS bucket, figure out what the "most local" resources are for things like packet memory (mbufs), tcp/udp/inp control structures, etc. I had a chat with jhb yesterday and he reminded me that y'all at isilon have been looking into this. He described a few interesting cases from the kernel side to me. * On architectures with external IO controllers, the path cost from an IO device to multiple CPUs may be (almost) equivalent, so there's not a huge penalty to allocate things on the wrong CPU. I think it'll be nice to get CPU local affinity where possible so we can parallelise DRAM access fully, but we can play with this and see. * On architectures with CPU-integrated IO controllers, there's a large penalty for doing inter-CPU IO, * .. but there's not such a huge penalty for doing inter-CPU memory access. Given that, we may find that we should always put the IO resources local to the CPU it's attached to, even if we decide to run some / all of the IO for the device on another CPU. Ie, any RAM that the IO device is doing data or descriptor DMA into should be local to that device. John said that in his experience it seemed the penalty for a non-local CPU touching memory was much less than device DMA crossing QPI. So the tricky bit is figuring that out and expressing it all in a way that allows us to do memory allocation and CPU binding in a more aware way. The other half of this tricky thing is to allow it to be easily overridden by a curious developer or system administrator that wants to experiment with different policies. Now, I'm very specifically only addressing the low level kernel IO / memory allocation requirements here. There's other things to worry about up in userland; I think you're trying to address that in your KPI descriptions. Thoughts? -a