Date: Thu, 19 Feb 2015 14:49:17 -0800 From: "K. Macy" <kmacy@freebsd.org> To: John Baldwin <john@baldwin.cx> Cc: Alan Cox <alc@freebsd.org>, John-Mark Gurney <jmg@funkthat.com>, Konstantin Belousov <kib@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: getting NUMA into the tree (userland most interesting for me) Message-ID: <CAHM0Q_NdiGUD35Fx3%2B%2B=mtZjHdj9qDTSRCXwgUV4vSCb6z4ATA@mail.gmail.com> In-Reply-To: <83795148.GHHzUeRKp6@ralph.baldwin.cx> References: <20150219041012.GJ1953@funkthat.com> <CAHM0Q_NXfN-1jrBEOkQPw67fqL8yp9XBq8PUzJAB6nt89=GvrA@mail.gmail.com> <83795148.GHHzUeRKp6@ralph.baldwin.cx>
next in thread | previous in thread | raw e-mail | index | archive | help
>> I personally don't think the infrastructure is far enough along that >> this is near to be an interesting value proposition. However, that >> said, I do believe that maintaining linux compatibility is important. >> Thus I would be for adding it to the linux compatibility layer and >> export it on the FreeBSD API side purely as an SPI until consensus is >> reached. > > Yes, I think we have a fair bit to do in the kernel before we are in a > position to export anything truly useful to userland unfortunately. The last > time I talked with Jeff about projects/numa (after the first draft of the wiki > page) I came away with the impression that there might be some things we can > pull out of that branch, but that it isn't suitable for merging upstream > directly. Jeff noted that he and Alan had gone through several iterations of > this already (I believe at least 3 completely different policy designs) all of > which had their own issues. > > Outside of the VM I think that we can keep the APIs somewhat stable by having > this opaque policy cookie to pass around that we can redefine the guts of > later. However, various parts of the VM all have to handle whatever the > policy defines, and while the vm_phys bits and contigmalloc() might be kind of > obvious to implement, higher level VM layers like kmem() and malloc() are more > complicated. One thing that is in projects/numa is changes for UMA that we > can hopefully reuse much of, but I don't recall how much (if any) of > kmem/malloc is in there. Also, while vm_phys is one of the first things to > do, I know that Alan and Jeff have pending patches to remove the cache queue > (since it is far less useful than it seems) which simplify vm_phys making it > easier to implement NUMA policies there, so I'm hoping we can get that in > sooner before having to start tearing up the VM too much. This is why the > stuff I currently have is targeted non-VM bits like interrupts as getting that > correct is lower-hanging fruit that might provide some gains regardless. Even > once vm_phys is done I think the first thing to tackle next is contigmalloc to > facilitate static bus_dma allocations (descriptor rings and such) being local > to a device. > Contigmalloc improvements and cache queue removal are in the phabricator queue now. They are also prerequisites for per-cpu free page caches which are a huge scalability improvement for some workloads such as Netflix's. There is still a fair amount of scalability work (including Jeffr's per-domain pagedaemon work) that really needs to happens before we can seriously think about a general user-level NUMA interface. -K
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHM0Q_NdiGUD35Fx3%2B%2B=mtZjHdj9qDTSRCXwgUV4vSCb6z4ATA>