Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 08 Nov 2024 09:04:51 +0000
From:      bugzilla-noreply@freebsd.org
To:        x11@FreeBSD.org
Subject:   [Bug 277476] graphics/drm-515-kmod: amdgpu periodic hangs due to phys contig allocations
Message-ID:  <bug-277476-7141-Mk1qAiy5E3@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-277476-7141@https.bugs.freebsd.org/bugzilla/>
References:  <bug-277476-7141@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D277476

--- Comment #5 from sigsys@gmail.com ---
Yeah so this problem was super annoying. But thanks to the information alre=
ady
posted here, seems like it wasn't too hard to fix.

IIUC the drm code (ttm_pool_alloc()) asking for contiguous pages doesn't
actually need contiguous pages. It's just an opportunistic optimization. Wh=
en
allocation fails, it fallsback to asking for less and less contiguous pages
(eventually only asking for one page at a time). When ttm_pool_alloc_page()
asks for more than one page, it passes alloc_pages() some extra flags
(__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM).

What's expensive is the vm_page_reclaim_contig() in linux_alloc_pages(). The
function tries too hard to find contiguous memory (that the drm code doesn't
even require) and as physical memory gets too fragmented it becomes very sl=
ow.

So, very simple fix, make linux_alloc_pages() react to one of the flag pass=
ed
by the drm code:

diff --git a/sys/compat/linuxkpi/common/include/linux/gfp.h
b/sys/compat/linuxkpi/common/include/linux/gfp.h
index 2fcc0dc05f29..58a021086c98 100644
--- a/sys/compat/linuxkpi/common/include/linux/gfp.h
+++ b/sys/compat/linuxkpi/common/include/linux/gfp.h
@@ -44,7 +44,6 @@
 #define        __GFP_NOWARN    0
 #define        __GFP_HIGHMEM   0
 #define        __GFP_ZERO      M_ZERO
-#define        __GFP_NORETRY   0
 #define        __GFP_NOMEMALLOC 0
 #define        __GFP_RECLAIM   0
 #define        __GFP_RECLAIMABLE   0
@@ -58,7 +57,8 @@
 #define        __GFP_KSWAPD_RECLAIM    0
 #define        __GFP_WAIT      M_WAITOK
 #define        __GFP_DMA32     (1U << 24) /* LinuxKPI only */
-#define        __GFP_BITS_SHIFT 25
+#define        __GFP_NORETRY   (1U << 25) /* LinuxKPI only */
+#define        __GFP_BITS_SHIFT 26
 #define        __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1)
 #define        __GFP_NOFAIL    M_WAITOK

diff --git a/sys/compat/linuxkpi/common/src/linux_page.c
b/sys/compat/linuxkpi/common/src/linux_page.c
index 18b90b5e3d73..71a6890a3795 100644
--- a/sys/compat/linuxkpi/common/src/linux_page.c
+++ b/sys/compat/linuxkpi/common/src/linux_page.c
@@ -118,7 +118,7 @@ linux_alloc_pages(gfp_t flags, unsigned int order)
                        page =3D vm_page_alloc_noobj_contig(req, npages, 0,=
 pmax,
                            PAGE_SIZE, 0, VM_MEMATTR_DEFAULT);
                        if (page =3D=3D NULL) {
-                               if (flags & M_WAITOK) {
+                               if ((flags & (M_WAITOK | __GFP_NORETRY)) =
=3D=3D
M_WAITOK) {
                                        int err =3D vm_page_reclaim_contig(=
req,
                                            npages, 0, pmax, PAGE_SIZE, 0);
                                        if (err =3D=3D ENOMEM)

Been working fine here with amdgpu for about 3 weeks.

(The drm modules need to be recompiled with the modified kernel header.)

--=20
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-277476-7141-Mk1qAiy5E3>