From owner-freebsd-arch@FreeBSD.ORG Thu Feb 19 22:49:19 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E51792A1; Thu, 19 Feb 2015 22:49:18 +0000 (UTC) Received: from mail-yh0-x22e.google.com (mail-yh0-x22e.google.com [IPv6:2607:f8b0:4002:c01::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9DC3CF3D; Thu, 19 Feb 2015 22:49:18 +0000 (UTC) Received: by yhab6 with SMTP id b6so1663820yha.10; Thu, 19 Feb 2015 14:49:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=ROTBN0Rw+gg0WT8gkfIUFAGUAi+Ct97qNKJYvbmcqGk=; b=dB+OC4qdqXrYrnivV6ZLMT81SEXyD+Y2v3uNDMcUA81mvQ7lRWXJqVv9FzidYF9/DN Tf4oN9PqmMklbr7CoVfpF9rAu1/rzp29JJ3CNeNAIkrdZcUARohfuriMQEbzzladDlXm FedLPlysgnvhl68vl+87cy3MD9COslH1vsl4sIeRHFT42Je/u5os8bvbF6WpCkLAL5hL IQxb61c4Er2MxBEHPRljuPQ5Pf5ZwDrpznC6GRfF84/KFKma+MTlukhW726oiosJzvXG ncGt74E+TloFacpUE45OmcnqytB4da1s+fspFEuWODHZCLW6FfFxx0NgWmwrAh3lyY6z Fvcg== MIME-Version: 1.0 X-Received: by 10.170.185.6 with SMTP id b6mr5212836yke.25.1424386157775; Thu, 19 Feb 2015 14:49:17 -0800 (PST) Sender: kmacybsd@gmail.com Received: by 10.170.76.66 with HTTP; Thu, 19 Feb 2015 14:49:17 -0800 (PST) In-Reply-To: <83795148.GHHzUeRKp6@ralph.baldwin.cx> References: <20150219041012.GJ1953@funkthat.com> <83795148.GHHzUeRKp6@ralph.baldwin.cx> Date: Thu, 19 Feb 2015 14:49:17 -0800 X-Google-Sender-Auth: YIcKVfple5swbc2zQ93mb5LIVnw Message-ID: Subject: Re: getting NUMA into the tree (userland most interesting for me) From: "K. Macy" To: John Baldwin Content-Type: text/plain; charset=UTF-8 Cc: Alan Cox , John-Mark Gurney , Konstantin Belousov , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Feb 2015 22:49:19 -0000 >> I personally don't think the infrastructure is far enough along that >> this is near to be an interesting value proposition. However, that >> said, I do believe that maintaining linux compatibility is important. >> Thus I would be for adding it to the linux compatibility layer and >> export it on the FreeBSD API side purely as an SPI until consensus is >> reached. > > Yes, I think we have a fair bit to do in the kernel before we are in a > position to export anything truly useful to userland unfortunately. The last > time I talked with Jeff about projects/numa (after the first draft of the wiki > page) I came away with the impression that there might be some things we can > pull out of that branch, but that it isn't suitable for merging upstream > directly. Jeff noted that he and Alan had gone through several iterations of > this already (I believe at least 3 completely different policy designs) all of > which had their own issues. > > Outside of the VM I think that we can keep the APIs somewhat stable by having > this opaque policy cookie to pass around that we can redefine the guts of > later. However, various parts of the VM all have to handle whatever the > policy defines, and while the vm_phys bits and contigmalloc() might be kind of > obvious to implement, higher level VM layers like kmem() and malloc() are more > complicated. One thing that is in projects/numa is changes for UMA that we > can hopefully reuse much of, but I don't recall how much (if any) of > kmem/malloc is in there. Also, while vm_phys is one of the first things to > do, I know that Alan and Jeff have pending patches to remove the cache queue > (since it is far less useful than it seems) which simplify vm_phys making it > easier to implement NUMA policies there, so I'm hoping we can get that in > sooner before having to start tearing up the VM too much. This is why the > stuff I currently have is targeted non-VM bits like interrupts as getting that > correct is lower-hanging fruit that might provide some gains regardless. Even > once vm_phys is done I think the first thing to tackle next is contigmalloc to > facilitate static bus_dma allocations (descriptor rings and such) being local > to a device. > Contigmalloc improvements and cache queue removal are in the phabricator queue now. They are also prerequisites for per-cpu free page caches which are a huge scalability improvement for some workloads such as Netflix's. There is still a fair amount of scalability work (including Jeffr's per-domain pagedaemon work) that really needs to happens before we can seriously think about a general user-level NUMA interface. -K