From owner-freebsd-arch@FreeBSD.ORG Fri Feb 20 01:15:33 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 567C7FE; Fri, 20 Feb 2015 01:15:33 +0000 (UTC) Received: from mail-ig0-x235.google.com (mail-ig0-x235.google.com [IPv6:2607:f8b0:4001:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1813425C; Fri, 20 Feb 2015 01:15:33 +0000 (UTC) Received: by mail-ig0-f181.google.com with SMTP id hn18so13732603igb.2; Thu, 19 Feb 2015 17:15:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=yMLAZl2kDryfKRN6KpE7BfFizPiHeiwhpux88rUbNEk=; b=E4nAWbaQ0gL2P9eYy2X+pMAujvIr7UHNUKujxucKHHEXmJY8vIVfY0kPRYdaKG1zHv kmhymnYucZwvj+CnO4LGp2W8FcBkwy7+1zpMzyjayrD6rfz4fLf+aRa1SeZLQHOwAv/N /w+a7OFnAJ8AuybpglH25+QacQuQ3DE1pLkE2Pdk0Y1MNroRAbBIBPI0AFAhsRLfxHSl nAxYDxxhKKIhqqaCgNYfXuGjLVOOgGg3rIHqb/ElhPOobTfqNA3ugB6CQomk2VNrpFou kZKgu2vgPWAW0y/K3waJT9s/H9rBqiHzQby4SkjMVax5EdlndT7iGzlvRM2qZcEd9z4G X2Eg== MIME-Version: 1.0 X-Received: by 10.42.130.74 with SMTP id u10mr9140784ics.61.1424394932402; Thu, 19 Feb 2015 17:15:32 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.36.17.66 with HTTP; Thu, 19 Feb 2015 17:15:32 -0800 (PST) In-Reply-To: References: <20150219041012.GJ1953@funkthat.com> <83795148.GHHzUeRKp6@ralph.baldwin.cx> Date: Thu, 19 Feb 2015 17:15:32 -0800 X-Google-Sender-Auth: X44NlrdTd1uYsMludV7Gcze1zwU Message-ID: Subject: Re: getting NUMA into the tree (userland most interesting for me) From: Adrian Chadd To: "K. Macy" Content-Type: text/plain; charset=UTF-8 Cc: John Baldwin , Alan Cox , John-Mark Gurney , Konstantin Belousov , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Feb 2015 01:15:33 -0000 On 19 February 2015 at 14:49, K. Macy wrote: >>> I personally don't think the infrastructure is far enough along that >>> this is near to be an interesting value proposition. However, that >>> said, I do believe that maintaining linux compatibility is important. >>> Thus I would be for adding it to the linux compatibility layer and >>> export it on the FreeBSD API side purely as an SPI until consensus is >>> reached. >> >> Yes, I think we have a fair bit to do in the kernel before we are in a >> position to export anything truly useful to userland unfortunately. The last >> time I talked with Jeff about projects/numa (after the first draft of the wiki >> page) I came away with the impression that there might be some things we can >> pull out of that branch, but that it isn't suitable for merging upstream >> directly. Jeff noted that he and Alan had gone through several iterations of >> this already (I believe at least 3 completely different policy designs) all of >> which had their own issues. >> >> Outside of the VM I think that we can keep the APIs somewhat stable by having >> this opaque policy cookie to pass around that we can redefine the guts of >> later. However, various parts of the VM all have to handle whatever the >> policy defines, and while the vm_phys bits and contigmalloc() might be kind of >> obvious to implement, higher level VM layers like kmem() and malloc() are more >> complicated. One thing that is in projects/numa is changes for UMA that we >> can hopefully reuse much of, but I don't recall how much (if any) of >> kmem/malloc is in there. Also, while vm_phys is one of the first things to >> do, I know that Alan and Jeff have pending patches to remove the cache queue >> (since it is far less useful than it seems) which simplify vm_phys making it >> easier to implement NUMA policies there, so I'm hoping we can get that in >> sooner before having to start tearing up the VM too much. This is why the >> stuff I currently have is targeted non-VM bits like interrupts as getting that >> correct is lower-hanging fruit that might provide some gains regardless. Even >> once vm_phys is done I think the first thing to tackle next is contigmalloc to >> facilitate static bus_dma allocations (descriptor rings and such) being local >> to a device. >> > > Contigmalloc improvements and cache queue removal are in the > phabricator queue now. They are also prerequisites for per-cpu free > page caches which are a huge scalability improvement for some > workloads such as Netflix's. > > There is still a fair amount of scalability work (including Jeffr's > per-domain pagedaemon work) that really needs to happens before we can > seriously think about a general user-level NUMA interface. Is there anything wrong with maybe bringing over the basic low level allocator changes from projects/numa so the basics are there? -adrian