From nobody Thu Jul 25 19:47:16 2024 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WVLy32QTLz5RB06 for ; Thu, 25 Jul 2024 19:47:23 +0000 (UTC) (envelope-from jake@technologyfriends.net) Received: from ms11p00im-qufo17281501.me.com (ms11p00im-qufo17281501.me.com [17.58.38.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4WVLy3036Hz43jV for ; Thu, 25 Jul 2024 19:47:22 +0000 (UTC) (envelope-from jake@technologyfriends.net) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=technologyfriends.net; s=sig1; t=1721936840; bh=rJHVRpZdANp7/DiITccDafsUq/D0AxedBh7oc6RN1BM=; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; b=eIXVhcxZdzl8PLKmG5Ylom6sWQj1Dt7gRQyVlnfI83gyoJz+CH3cHGVjrisuAyTqZ E81VcSSnaZCGI4pniylxu5HVJz0sYOe4g2YLFIzyFOiKrnq+hWEbWVvewymoJ4ErZo lhB+SYqpsRj7IeSPfTfrFDvktv8WphjdTwi/CT+dJJraSvLlnF987g2eYuQi4I+IYk 86Tp2S97sXhECkeowaUuM6BuwnhQL1zzll1iP4xAOi2Fmz6YMDIAc08KJFBWyJG5Vx G3vlCc9YtIsiRn2w+gSjZYZFDvISAlMjTtn6iaHzDR2lZ6N80l2HgJkt8wyPNNK5zW r6silvFkMsO8w== Received: from [10.0.233.209] (ms11p00im-dlb-asmtpmailmevip.me.com [17.57.154.19]) by ms11p00im-qufo17281501.me.com (Postfix) with ESMTPSA id 455D7B61C3D; Thu, 25 Jul 2024 19:47:17 +0000 (UTC) Message-ID: <35da66f9-b913-45ea-90f4-16a2fa072848@technologyfriends.net> Date: Thu, 25 Jul 2024 14:47:16 -0500 List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: FreeBSD hugepages To: Konstantin Belousov Cc: freebsd-hackers@freebsd.org References: <1ced4290-4a31-4218-8611-63a44c307e87@technologyfriends.net> Content-Language: en-US From: Jake Freeland In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Proofpoint-ORIG-GUID: LPnpgvxYj-5EycG1_HjjpF9PjsmzQVY- X-Proofpoint-GUID: LPnpgvxYj-5EycG1_HjjpF9PjsmzQVY- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-25_19,2024-07-25_03,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 clxscore=1030 mlxscore=0 spamscore=0 malwarescore=0 mlxlogscore=999 bulkscore=0 adultscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2308100000 definitions=main-2407250135 X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:714, ipnet:17.58.32.0/20, country:US] X-Rspamd-Queue-Id: 4WVLy3036Hz43jV On 7/25/24 14:02, Konstantin Belousov wrote: > On Thu, Jul 25, 2024 at 01:46:17PM -0500, Jake Freeland wrote: >> Hi there, >> >> I have been steadily working on bringing Data Plane Development Kit (DPDK) >> on FreeBSD up to date with the Linux version. The most significant hurdle so >> far has been supporting concurrent DPDK processes, each with their own >> contiguous memory regions. >> >> These contiguous regions are used by DPDK as a heap for allocating DMA >> buffers and other miscellaneous resources. Retrieving the underlying memory >> and mapping these regions is currently different on Linux and FreeBSD: >> >> On Linux, hugepages are fetched from the kernel's pre-allocated hugepage >> pool and are mapped into virtual address space on DPDK initialization. Since >> the hugepages exist in a pool, multiple processes can reserve their own >> hugepages and operate concurrently. >> >> On FreeBSD, DPDK uses an in-house contigmem kernel module that reserves a >> large contiguous region of memory on load. During DPDK initialization, the >> entire region is mapped into virtual address space. This leaves no memory >> for another independent DPDK process, so only one process can operate at a >> time. >> >> I could modify the DPDK contigmem module to mimic Linux's hugepages, but I >> thought it would be better to integrate and upstream a hugepage-like >> interface directly in the FreeBSD kernel source. I am writing this email to >> see if anyone has any advice on the matter. I did not see any previous >> attempts at this in Phabriactor or the commit log, but it is possible that I >> missed it. I have read about transparent superpage promotion, but that seems >> like a different mechanism altogether. >> >> At a quick glance, the implementation seems straightforward: read some >> loader tunables, allocate persistent hugepages at boot time, and create a >> pseudo filesystem that supports creating and mapping hugepages. I could be >> underestimating the magnitude of this task, but that is why I'm asking for >> thoughts and advice :) >> >> For reference, here is Linux's documentation on hugepages: >> https://docs.kernel.org/admin-guide/mm/hugetlbpage.html > Are posix shm largepages objects enough (they were developed to support > DPDK). Look for shm_create_largepage(3). Yes, shm_create_largepage(2) looks promising, but I would like the ability to allocate these largepages at boot time when memory fragmentation as at a minimum. Perhaps a couple sysctl tunables could be added onto the vm.largepages node to specify a pagesize and allocate some number of pages at boot? It seems Linux had an interface similar to shm_create_largepage(2) back in v2.5, but they removed it in favor of their hugetlbfs filesystem. It would be nice to stay close to the file-backed Linux interface to maximize code sharing in userspace. It looks like the foundation for hugepages is there, but the interface for allocation and access needs to be extended. Jake Freeland