Date: Fri, 08 Nov 2024 09:04:51 +0000 From: bugzilla-noreply@freebsd.org To: x11@FreeBSD.org Subject: [Bug 277476] graphics/drm-515-kmod: amdgpu periodic hangs due to phys contig allocations Message-ID: <bug-277476-7141-Mk1qAiy5E3@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-277476-7141@https.bugs.freebsd.org/bugzilla/> References: <bug-277476-7141@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D277476 --- Comment #5 from sigsys@gmail.com --- Yeah so this problem was super annoying. But thanks to the information alre= ady posted here, seems like it wasn't too hard to fix. IIUC the drm code (ttm_pool_alloc()) asking for contiguous pages doesn't actually need contiguous pages. It's just an opportunistic optimization. Wh= en allocation fails, it fallsback to asking for less and less contiguous pages (eventually only asking for one page at a time). When ttm_pool_alloc_page() asks for more than one page, it passes alloc_pages() some extra flags (__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM). What's expensive is the vm_page_reclaim_contig() in linux_alloc_pages(). The function tries too hard to find contiguous memory (that the drm code doesn't even require) and as physical memory gets too fragmented it becomes very sl= ow. So, very simple fix, make linux_alloc_pages() react to one of the flag pass= ed by the drm code: diff --git a/sys/compat/linuxkpi/common/include/linux/gfp.h b/sys/compat/linuxkpi/common/include/linux/gfp.h index 2fcc0dc05f29..58a021086c98 100644 --- a/sys/compat/linuxkpi/common/include/linux/gfp.h +++ b/sys/compat/linuxkpi/common/include/linux/gfp.h @@ -44,7 +44,6 @@ #define __GFP_NOWARN 0 #define __GFP_HIGHMEM 0 #define __GFP_ZERO M_ZERO -#define __GFP_NORETRY 0 #define __GFP_NOMEMALLOC 0 #define __GFP_RECLAIM 0 #define __GFP_RECLAIMABLE 0 @@ -58,7 +57,8 @@ #define __GFP_KSWAPD_RECLAIM 0 #define __GFP_WAIT M_WAITOK #define __GFP_DMA32 (1U << 24) /* LinuxKPI only */ -#define __GFP_BITS_SHIFT 25 +#define __GFP_NORETRY (1U << 25) /* LinuxKPI only */ +#define __GFP_BITS_SHIFT 26 #define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1) #define __GFP_NOFAIL M_WAITOK diff --git a/sys/compat/linuxkpi/common/src/linux_page.c b/sys/compat/linuxkpi/common/src/linux_page.c index 18b90b5e3d73..71a6890a3795 100644 --- a/sys/compat/linuxkpi/common/src/linux_page.c +++ b/sys/compat/linuxkpi/common/src/linux_page.c @@ -118,7 +118,7 @@ linux_alloc_pages(gfp_t flags, unsigned int order) page =3D vm_page_alloc_noobj_contig(req, npages, 0,= pmax, PAGE_SIZE, 0, VM_MEMATTR_DEFAULT); if (page =3D=3D NULL) { - if (flags & M_WAITOK) { + if ((flags & (M_WAITOK | __GFP_NORETRY)) = =3D=3D M_WAITOK) { int err =3D vm_page_reclaim_contig(= req, npages, 0, pmax, PAGE_SIZE, 0); if (err =3D=3D ENOMEM) Been working fine here with amdgpu for about 3 weeks. (The drm modules need to be recompiled with the modified kernel header.) --=20 You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-277476-7141-Mk1qAiy5E3>