Date: Thu, 25 Jul 2024 16:11:22 -0500 From: Jake Freeland <jake@technologyfriends.net> To: Mark Johnston <markj@freebsd.org> Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-hackers@freebsd.org Subject: Re: FreeBSD hugepages Message-ID: <4d4398e5-81ba-4fbd-9806-649ec70abdb4@technologyfriends.net> In-Reply-To: <ZqKzCK4pHg1mrSOa@nuc> References: <1ced4290-4a31-4218-8611-63a44c307e87@technologyfriends.net> <ZqKhP0aR0fb_f6XE@kib.kiev.ua> <35da66f9-b913-45ea-90f4-16a2fa072848@technologyfriends.net> <ZqKzCK4pHg1mrSOa@nuc>
next in thread | previous in thread | raw e-mail | index | archive | help
On 7/25/24 15:18, Mark Johnston wrote: > On Thu, Jul 25, 2024 at 02:47:16PM -0500, Jake Freeland wrote: >> On 7/25/24 14:02, Konstantin Belousov wrote: >>> On Thu, Jul 25, 2024 at 01:46:17PM -0500, Jake Freeland wrote: >>>> Hi there, >>>> >>>> I have been steadily working on bringing Data Plane Development Kit (DPDK) >>>> on FreeBSD up to date with the Linux version. The most significant hurdle so >>>> far has been supporting concurrent DPDK processes, each with their own >>>> contiguous memory regions. >>>> >>>> These contiguous regions are used by DPDK as a heap for allocating DMA >>>> buffers and other miscellaneous resources. Retrieving the underlying memory >>>> and mapping these regions is currently different on Linux and FreeBSD: >>>> >>>> On Linux, hugepages are fetched from the kernel's pre-allocated hugepage >>>> pool and are mapped into virtual address space on DPDK initialization. Since >>>> the hugepages exist in a pool, multiple processes can reserve their own >>>> hugepages and operate concurrently. >>>> >>>> On FreeBSD, DPDK uses an in-house contigmem kernel module that reserves a >>>> large contiguous region of memory on load. During DPDK initialization, the >>>> entire region is mapped into virtual address space. This leaves no memory >>>> for another independent DPDK process, so only one process can operate at a >>>> time. >>>> >>>> I could modify the DPDK contigmem module to mimic Linux's hugepages, but I >>>> thought it would be better to integrate and upstream a hugepage-like >>>> interface directly in the FreeBSD kernel source. I am writing this email to >>>> see if anyone has any advice on the matter. I did not see any previous >>>> attempts at this in Phabriactor or the commit log, but it is possible that I >>>> missed it. I have read about transparent superpage promotion, but that seems >>>> like a different mechanism altogether. >>>> >>>> At a quick glance, the implementation seems straightforward: read some >>>> loader tunables, allocate persistent hugepages at boot time, and create a >>>> pseudo filesystem that supports creating and mapping hugepages. I could be >>>> underestimating the magnitude of this task, but that is why I'm asking for >>>> thoughts and advice :) >>>> >>>> For reference, here is Linux's documentation on hugepages: >>>> https://docs.kernel.org/admin-guide/mm/hugetlbpage.html >>> Are posix shm largepages objects enough (they were developed to support >>> DPDK). Look for shm_create_largepage(3). >> Yes, shm_create_largepage(2) looks promising, but I would like the ability >> to allocate these largepages at boot time when memory fragmentation as at a >> minimum. Perhaps a couple sysctl tunables could be added onto the >> vm.largepages node to specify a pagesize and allocate some number of pages >> at boot? > We could add an rc script which creates named largepage objects. This > can be done using the posixshmcontrol utility. That might not be early > enough during boot for some purposes. In that case, we could have a > module which creates such objects from within the kernel. This is > pretty straightforward to do; I wrote a dumb version of this for a > mips-specific project a few years ago, feel free to take code or > inspiration from it: https://people.freebsd.org/~markj/tlbdemo.c Looks simple enough. Thanks for the example code. >> It seems Linux had an interface similar to shm_create_largepage(2) back in >> v2.5, but they removed it in favor of their hugetlbfs filesystem. It would >> be nice to stay close to the file-backed Linux interface to maximize code >> sharing in userspace. It looks like the foundation for hugepages is there, >> but the interface for allocation and access needs to be extended. > POSIX shm objects have most of the properties one would want, I'd > expect, save the ability to access them via standard syscalls. What > else is missing besides the ability to reserve memory at boot time? Most notably, I would like the ability to allocate pages in a specific NUMA domain. Otherwise, in a perfect world, I'd like a unified interface for both Linux and FreeBSD. Linux hugepages are managed using standard system calls; files are mmap(2)'d into virtual address space from hugetlbfs and ftruncate(2)'d. A matching interface would not add an extra kernel entrypoint and even more importantly, it would ease the Linux-to-FreeBSD porting process for programs that use hugepages. Thanks, Jake Freeland
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4d4398e5-81ba-4fbd-9806-649ec70abdb4>