From nobody Mon Feb 20 16:24:20 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PL76F0V1Pz3t20w; Mon, 20 Feb 2023 16:24:21 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PL76D710pz4CtR; Mon, 20 Feb 2023 16:24:20 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1676910261; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=H0VAJMs8ss6MtHAUDPirjl0NhNe3ApfXw5Yzyu26EpE=; b=dzD0pkggPcjQVHwRlef/eFM18/sSw7BLWmOBJ0lLSJVbhYGs/0PX3oWN+jtkueBDETo0Sz pSyYAoTxcqCkkAwxqH/5rs26yPR8Pd0fQnjP0NbrbP7zZQmDhxoECabRqxHNyRXMsRgH0d YMIM5B8jS5u/O5TCLTqkCFSFNl75370IX8rRhnAkPuAx+zoGm7Lx2n2TMblzrJ7nY3aqhr tOftT6edxD0bwYNkQmg0Rt/U60i2jD5phjGKHfCk7XwQXHUrTArcQnes3E1IEZ1u2VpKoS WAggvV6A73H7/Bqkb+E6ls1zWdzgE69T+acuOvkZ97fqeHCAUuXaOqMJwnm+tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1676910261; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=H0VAJMs8ss6MtHAUDPirjl0NhNe3ApfXw5Yzyu26EpE=; b=QeVJ3A9grWRLGnPbYyD6bfZJ/1JqTWqMnp/vSubarv/r5LW6t5K9NLwMeYSs5p0Si9FXvH lgrKCd9Z9XzHkKmSSXVOnRQyknzV1TOx5FUnyz1RsEHUMV78UNMqrutLqJXfJr6tgH0FaH mYkSs/Iwyb1fJ/2qc/S0iEYtf+RhL2CW6fss+2OIPKoEAYEhJCrDiQA08VUhupBII4jboX 5aQjRozYiQ90lstqLvXgB6v/+8sMznUy/CAbT8+dnj+AYRdnareLa4h4aQR5VXdtTNGLG0 R5So7WpUjJtoWBQV2QaD8Se1PwgWtOQdDc4TS9fWMKyt7MkNa+2fEW7ia06Blw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1676910261; a=rsa-sha256; cv=none; b=u9bNbMykdTWW0tVRRmHEG4ajl2nAjjlS+kYAtqfxpE7NvCOxr5zCsyFbShddLpwN/b4ODE iCi+SqdJQNF3qoijQc4YyQSQ7Nau4Qfzwg7QqJm7rRHkVQe/BHF07qHgsSKJtuJLTsJkyx Bm0Ctb4W38DdF+YMXx0HOVb1d1VBvPcc7qTKKU/Raj1j/pIH38NZHnNeVjHaBCzXFzpOyO 6fn6kfqQyyUhZ87vc8hCahn6JV7JuvOmT/3rgrfz7RJPYSfYbNEZuWd1zUEw2Wy9xKm4ta T3baqGut9fWFv1+zGYeOZXxbToodLTxVEBvpb52k6Kk8xD4+kj69mLiD0uy9NA== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4PL76D64xnznm9; Mon, 20 Feb 2023 16:24:20 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 31KGOKj3064635; Mon, 20 Feb 2023 16:24:20 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 31KGOK2j064634; Mon, 20 Feb 2023 16:24:20 GMT (envelope-from git) Date: Mon, 20 Feb 2023 16:24:20 GMT Message-Id: <202302201624.31KGOK2j064634@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Mark Johnston Subject: git: 74631b842197 - stable/13 - shm: Document shm_create_largepage() List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: markj X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: 74631b842197d520b5889b3f24863f5037bbc5d8 Auto-Submitted: auto-generated X-ThisMailContainsUnwantedMimeParts: N The branch stable/13 has been updated by markj: URL: https://cgit.FreeBSD.org/src/commit/?id=74631b842197d520b5889b3f24863f5037bbc5d8 commit 74631b842197d520b5889b3f24863f5037bbc5d8 Author: Mark Johnston AuthorDate: 2023-02-03 15:55:30 +0000 Commit: Mark Johnston CommitDate: 2023-02-20 16:24:08 +0000 shm: Document shm_create_largepage() While here, move notes about FreeBSD-specific functionality to the COMPATIBILITY section, and document the ECAPMODE error for shm_open(). Reviewed by: pauamma, kib MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38282 (cherry picked from commit 5f03f96fbefbb5c68a5d7d06728ff5b4a05f87b0) --- lib/libc/sys/Makefile.inc | 1 + lib/libc/sys/shm_open.2 | 173 +++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 163 insertions(+), 11 deletions(-) diff --git a/lib/libc/sys/Makefile.inc b/lib/libc/sys/Makefile.inc index 5c30f7d6b796..6f663158d840 100644 --- a/lib/libc/sys/Makefile.inc +++ b/lib/libc/sys/Makefile.inc @@ -484,6 +484,7 @@ MLINKS+=setuid.2 setegid.2 \ setuid.2 setgid.2 MLINKS+=shmat.2 shmdt.2 MLINKS+=shm_open.2 memfd_create.3 \ + shm_open.2 shm_create_largepage.3 \ shm_open.2 shm_unlink.2 \ shm_open.2 shm_rename.2 MLINKS+=sigwaitinfo.2 sigtimedwait.2 diff --git a/lib/libc/sys/shm_open.2 b/lib/libc/sys/shm_open.2 index ec12f9f2c0b7..061f0b126c53 100644 --- a/lib/libc/sys/shm_open.2 +++ b/lib/libc/sys/shm_open.2 @@ -28,11 +28,11 @@ .\" .\" $FreeBSD$ .\" -.Dd September 26, 2019 +.Dd January 30, 2023 .Dt SHM_OPEN 2 .Os .Sh NAME -.Nm memfd_create , shm_open , shm_rename, shm_unlink +.Nm memfd_create , shm_create_largepage , shm_open , shm_rename, shm_unlink .Nd "shared memory object operations" .Sh LIBRARY .Lb libc @@ -43,6 +43,14 @@ .Ft int .Fn memfd_create "const char *name" "unsigned int flags" .Ft int +.Fo shm_create_largepage +.Fa "const char *path" +.Fa "int flags" +.Fa "int psind" +.Fa "int alloc_policy" +.Fa "mode_t mode" +.Fc +.Ft int .Fn shm_open "const char *path" "int flags" "mode_t mode" .Ft int .Fn shm_rename "const char *path_from" "const char *path_to" "int flags" @@ -51,8 +59,8 @@ .Sh DESCRIPTION The .Fn shm_open -system call opens (or optionally creates) a -.Tn POSIX +function opens (or optionally creates) a +POSIX shared memory object named .Fa path . The @@ -114,9 +122,7 @@ see and .Xr fcntl 2 . .Pp -As a -.Fx -extension, the constant +The constant .Dv SHM_ANON may be used for the .Fa path @@ -143,6 +149,131 @@ will fail with All other flags are ignored. .Pp The +.Fn shm_create_largepage +function behaves similarly to +.Fn shm_open , +except that the +.Dv O_CREAT +flag is implicitly specified, and the returned +.Dq largepage +object is always backed by aligned, physically contiguous chunks of memory. +This ensures that the object can be mapped using so-called +.Dq superpages , +which can improve application performance in some workloads by reducing the +number of translation lookaside buffer (TLB) entries required to access a +mapping of the object, +and by reducing the number of page faults performed when accessing a mapping. +This happens automatically for all largepage objects. +.Pp +An existing largepage object can be opened using the +.Fn shm_open +function. +Largepage shared memory objects behave slightly differently from non-largepage +objects: +.Bl -bullet -offset indent +.It +Memory for a largepage object is allocated when the object is +extended using the +.Xr ftruncate 2 +system call, whereas memory for regular shared memory objects is allocated +lazily and may be paged out to a swap device when not in use. +.It +The size of a mapping of a largepage object must be a multiple of the +underlying large page size. +Most attributes of such a mapping can only be modified at the granularity +of the large page size. +For example, when using +.Xr munmap 2 +to unmap a portion of a largepage object mapping, or when using +.Xr mprotect 2 +to adjust protections of a mapping of a largepage object, the starting address +must be large page size-aligned, and the length of the operation must be a +multiple of the large page size. +If not, the corresponding system call will fail and set +.Va errno +to +.Er EINVAL . +.El +.Pp +The +.Fa psind +argument to +.Fn shm_create_largepage +specifies the size of large pages used to back the object. +This argument is an index into the page sizes array returned by +.Xr getpagesizes 3 . +In particular, all large pages backing a largepage object must be of the +same size. +For example, on a system with large page sizes of 2MB and 1GB, a 2GB largepage +object will consist of either 1024 2MB pages, or 2 1GB pages, depending on +the value specified for the +.Fa psind +argument. +The +.Fa alloc_policy +parameter specifies what happens when an attempt to use +.Xr ftruncate 2 +to allocate memory for the object fails. +The following values are accepted: +.Bl -tag -offset indent -width SHM_ +.It Dv SHM_LARGEPAGE_ALLOC_DEFAULT +If the (non-blocking) memory allocation fails because there is insufficient free +contiguous memory, the kernel will attempt to defragment physical memory and +try another allocation. +The subsequent allocation may or may not succeed. +If this subsequent allocation also fails, +.Xr ftruncate 2 +will fail and set +.Va errno +to +.Er ENOMEM . +.It Dv SHM_LARGEPAGE_ALLOC_NOWAIT +If the memory allocation fails, +.Xr ftruncate 2 +will fail and set +.Va errno +to +.Er ENOMEM . +.It Dv SHM_LARGEPAGE_ALLOC_HARD +The kernel will attempt defragmentation until the allocation succeeds, +or an unblocked signal is delivered to the thread. +However, it is possible for physical memory to be fragmented such that the +allocation will never succeed. +.El +.Pp +The +.Dv FIOSSHMLPGCNF +and +.Dv FIOGSHMLPGCNF +.Xr ioctl 2 +commands can be used with a largepage shared memory object to get and set +largepage object parameters. +Both commands operate on the following structure: +.Bd -literal +struct shm_largepage_conf { + int psind; + int alloc_policy; +}; + +.Ed +The +.Dv FIOGSHMLPGCNF +command populates this structure with the current values of these parameters, +while the +.Dv FIOSSHMLPGCNF +command modifies the largepage object. +Currently only the +.Va alloc_policy +parameter may be modified. +Internally, +.Fn shm_create_largepage +works by creating a regular shared memory object using +.Fn shm_open , +and then converting it into a largepage object using the +.Dv FIOSSHMLPGCNF +ioctl command. +.Pp +The .Fn shm_rename system call atomically removes a shared memory object named .Fa path_from @@ -162,10 +293,6 @@ Return an error if an shm exists at .Fa path_to , rather than unlinking it. .El -.Fn shm_rename -is also a -.Fx -extension. .Pp The .Fn shm_unlink @@ -235,6 +362,17 @@ All functions return -1 on failure, and set to indicate the error. .Sh COMPATIBILITY The +.Fn shm_create_largepage +and +.Fn shm_rename +functions are +.Fx +extensions, as is support for the +.Dv SHM_ANON +value in +.Fn shm_open . +.Pp +The .Fa path , .Fa path_from , and @@ -377,6 +515,18 @@ and are specified and the named shared memory object does exist. .It Bq Er EACCES The required permissions (for reading or reading and writing) are denied. +.It Bq Er ECAPMODE +The process is running in capability mode (see +.Xr capsicum 4 ) +and attempted to create a named shared memory object. +.El +.Pp +.Fn shm_create_largepage +can fail for the reasons listed above. +It also fails with these error codes for the following conditions: +.Bl -tag -width Er +.It Bq Er ENOTTY +The kernel does not support large pages on the current platform. .El .Pp The following errors are defined for @@ -424,6 +574,7 @@ requires write permission to the shared memory object. .Xr close 2 , .Xr fstat 2 , .Xr ftruncate 2 , +.Xr ioctl 2 , .Xr mmap 2 , .Xr munmap 2 , .Xr sendfile 2