Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Feb 2015 14:49:17 -0800
From:      "K. Macy" <kmacy@freebsd.org>
To:        John Baldwin <john@baldwin.cx>
Cc:        Alan Cox <alc@freebsd.org>, John-Mark Gurney <jmg@funkthat.com>, Konstantin Belousov <kib@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: getting NUMA into the tree (userland most interesting for me)
Message-ID:  <CAHM0Q_NdiGUD35Fx3%2B%2B=mtZjHdj9qDTSRCXwgUV4vSCb6z4ATA@mail.gmail.com>
In-Reply-To: <83795148.GHHzUeRKp6@ralph.baldwin.cx>
References:  <20150219041012.GJ1953@funkthat.com> <CAHM0Q_NXfN-1jrBEOkQPw67fqL8yp9XBq8PUzJAB6nt89=GvrA@mail.gmail.com> <83795148.GHHzUeRKp6@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
>> I personally don't think the infrastructure is far enough along that
>> this is near to be an interesting value proposition. However, that
>> said, I do believe that maintaining linux compatibility is important.
>> Thus I would be for adding it to the linux compatibility layer and
>> export it on the FreeBSD API side purely as an SPI until consensus is
>> reached.
>
> Yes, I think we have a fair bit to do in the kernel before we are in a
> position to export anything truly useful to userland unfortunately.  The last
> time I talked with Jeff about projects/numa (after the first draft of the wiki
> page) I came away with the impression that there might be some things we can
> pull out of that branch, but that it isn't suitable for merging upstream
> directly.  Jeff noted that he and Alan had gone through several iterations of
> this already (I believe at least 3 completely different policy designs) all of
> which had their own issues.
>
> Outside of the VM I think that we can keep the APIs somewhat stable by having
> this opaque policy cookie to pass around that we can redefine the guts of
> later.  However, various parts of the VM all have to handle whatever the
> policy defines, and while the vm_phys bits and contigmalloc() might be kind of
> obvious to implement, higher level VM layers like kmem() and malloc() are more
> complicated.  One thing that is in projects/numa is changes for UMA that we
> can hopefully reuse much of, but I don't recall how much (if any) of
> kmem/malloc is in there.  Also, while vm_phys is one of the first things to
> do, I know that Alan and Jeff have pending patches to remove the cache queue
> (since it is far less useful than it seems) which simplify vm_phys making it
> easier to implement NUMA policies there, so I'm hoping we can get that in
> sooner before having to start tearing up the VM too much.  This is why the
> stuff I currently have is targeted non-VM bits like interrupts as getting that
> correct is lower-hanging fruit that might provide some gains regardless.  Even
> once vm_phys is done I think the first thing to tackle next is contigmalloc to
> facilitate static bus_dma allocations (descriptor rings and such) being local
> to a device.
>

Contigmalloc improvements and cache queue removal are in the
phabricator queue now. They are also prerequisites for per-cpu free
page caches which are a huge scalability improvement for some
workloads such as Netflix's.

There is still a fair amount of scalability work  (including Jeffr's
per-domain pagedaemon work) that really needs to happens before we can
seriously think about a general user-level NUMA interface.



-K



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHM0Q_NdiGUD35Fx3%2B%2B=mtZjHdj9qDTSRCXwgUV4vSCb6z4ATA>