Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Feb 2015 00:17:09 -0800
From:      "K. Macy" <kmacy@freebsd.org>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        John Baldwin <john@baldwin.cx>, Alan Cox <alc@freebsd.org>, John-Mark Gurney <jmg@funkthat.com>, Konstantin Belousov <kib@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: getting NUMA into the tree (userland most interesting for me)
Message-ID:  <CAHM0Q_Po7zkXhsS6N75sbLY1b5GmHmKbBE7T4z6dQg3CGWAuYw@mail.gmail.com>
In-Reply-To: <CAJ-Vmok4peyq95o7%2BT7EkEEVb2ZqU3Y0pd_9kTMyBrxuhvX05w@mail.gmail.com>
References:  <20150219041012.GJ1953@funkthat.com> <CAHM0Q_NXfN-1jrBEOkQPw67fqL8yp9XBq8PUzJAB6nt89=GvrA@mail.gmail.com> <83795148.GHHzUeRKp6@ralph.baldwin.cx> <CAHM0Q_NdiGUD35Fx3%2B%2B=mtZjHdj9qDTSRCXwgUV4vSCb6z4ATA@mail.gmail.com> <CAJ-Vmok4peyq95o7%2BT7EkEEVb2ZqU3Y0pd_9kTMyBrxuhvX05w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
>>> Yes, I think we have a fair bit to do in the kernel before we are in a
>>> position to export anything truly useful to userland unfortunately.  The last
>>> time I talked with Jeff about projects/numa (after the first draft of the wiki
>>> page) I came away with the impression that there might be some things we can
>>> pull out of that branch, but that it isn't suitable for merging upstream
>>> directly.  Jeff noted that he and Alan had gone through several iterations of
>>> this already (I believe at least 3 completely different policy designs) all of
>>> which had their own issues.
>>>
>>> Outside of the VM I think that we can keep the APIs somewhat stable by having
>>> this opaque policy cookie to pass around that we can redefine the guts of
>>> later.  However, various parts of the VM all have to handle whatever the
>>> policy defines, and while the vm_phys bits and contigmalloc() might be kind of
>>> obvious to implement, higher level VM layers like kmem() and malloc() are more
>>> complicated.  One thing that is in projects/numa is changes for UMA that we
>>> can hopefully reuse much of, but I don't recall how much (if any) of
>>> kmem/malloc is in there.  Also, while vm_phys is one of the first things to
>>> do, I know that Alan and Jeff have pending patches to remove the cache queue
>>> (since it is far less useful than it seems) which simplify vm_phys making it
>>> easier to implement NUMA policies there, so I'm hoping we can get that in
>>> sooner before having to start tearing up the VM too much.  This is why the
>>> stuff I currently have is targeted non-VM bits like interrupts as getting that
>>> correct is lower-hanging fruit that might provide some gains regardless.  Even
>>> once vm_phys is done I think the first thing to tackle next is contigmalloc to
>>> facilitate static bus_dma allocations (descriptor rings and such) being local
>>> to a device.
>>>
>>
>> Contigmalloc improvements and cache queue removal are in the
>> phabricator queue now. They are also prerequisites for per-cpu free
>> page caches which are a huge scalability improvement for some
>> workloads such as Netflix's.
>>
>> There is still a fair amount of scalability work  (including Jeffr's
>> per-domain pagedaemon work) that really needs to happens before we can
>> seriously think about a general user-level NUMA interface.
>
> Is there anything wrong with maybe bringing over the basic low level
> allocator changes from projects/numa so the basics are there?


I think they're probably predicated on the work that is being
shepherded in now. Even if not, it would require someone to shepherd
it in and the corresponding spare cycles from alc to review / revise /
repeat - which seem to be in short supply.

-K



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHM0Q_Po7zkXhsS6N75sbLY1b5GmHmKbBE7T4z6dQg3CGWAuYw>