From nobody Mon Mar 4 14:12:20 2024 X-Original-To: ports-bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TpLHT3M9cz5ChBV for ; Mon, 4 Mar 2024 14:12:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TpLHS5fD3z58Nn for ; Mon, 4 Mar 2024 14:12:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1709561540; a=rsa-sha256; cv=none; b=yLTfn/c60cIR6JbzQ/X+Ly48GbQI5UwJPBSd5akrHiy6Oa+9EkzBE9oPuNdXTtI5JijUqm sA4Mj2AGYsRbNxzbENmVqYsKb3JP5RSNs2QwR0IyCg3sgqyYj2/emy90O0kKVLo7w6gAT+ HtrRCD7d+xmzjRT+xB/4V+jvPNL02uharlRefnjpUDEN2BXBDsrRYxDODjlfksmDOdhdee Pbuz9bZtnsAC44165iZ6mpQnP5sq2Jw2+VzP8eKNpAF7E0pvTvX6QpBfkMFJ6Xib8RgPW1 p9lCDQxtZS3Z58wUjlERrXFyCn0YGEPKCj+TuMYp81Rs0Z9BouSSihfHD83jzg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1709561540; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=lrt7PWWAcgAQIlnsjafqB8tl5+/60WJzWuV00aullT0=; b=X4RLwNTCqJyxDAiUhYGULpMomDPLBYW4num4Hx5VGrme+nSQXjurCurn44d1UCbjQ3Ndz/ xZ9dHxOZrn37HLAMLDbnS1U7JXS1F5nt3gZUh70umz4q71kvlzk2zYgzO6fqT867QjpUA5 AH/0NkrkWZF+tQcNoMp2URBzRVZvVIk6OxQheCuYqk4CczzCc0asMlRvJN/I77kuRffmjK hOdsOeTPkfm2NIB5TajYuS52xeH4Fwg4Zw/2F4DWSUXS3pCKgwcYfCugsaZH8AJpK4oSzh tJvQcFrhXbwmsZRTe5fBciu+cpnVEj7Nf02aKGpby1fPmr57l7xALF1bsdkQ5Q== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4TpLHS5Gp1zPcD for ; Mon, 4 Mar 2024 14:12:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 424ECKDs040207 for ; Mon, 4 Mar 2024 14:12:20 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 424ECK48040204 for ports-bugs@FreeBSD.org; Mon, 4 Mar 2024 14:12:20 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: ports-bugs@FreeBSD.org Subject: [Bug 277476] amdgpu/drm-kmod periodic hangs due to phys contig allocations Date: Mon, 04 Mar 2024 14:12:20 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Ports & Packages X-Bugzilla-Component: Individual Port(s) X-Bugzilla-Version: Latest X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: jeffpc@josefsipek.net X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: ports-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Ports bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-ports-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-ports-bugs@freebsd.org X-BeenThere: freebsd-ports-bugs@freebsd.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D277476 Bug ID: 277476 Summary: amdgpu/drm-kmod periodic hangs due to phys contig allocations Product: Ports & Packages Version: Latest Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: Individual Port(s) Assignee: ports-bugs@FreeBSD.org Reporter: jeffpc@josefsipek.net Two weeks ago I replaced an ancient nvidia graphics card with an AMD RX580 = card to run open source drivers. Everything works fine most of the time, but occasionally the system hangs for a few seconds (5-10, usually). The longer the system has been up, the worse it gets. Digging into it a bit, it is because userspace (it always looks like X) doe= s an ioctl into drm which then tries to allocate a large-ish piece of physically contiguous memory. This explains why it gets worse as uptime increases (fr= ee physical memory fragmentation) and when running firefox (the most memory hu= ngry application I use). I know nothing about graphics cards, the software stack supporting them, or the linux kernel API compatibility layer, but clearly i= t'd be beneficial if amdgpu/drm/whatever could make use of *virtually* contiguo= us pages or some kind of allocation caching/reuse to avoid repeatedly asking t= he vm code for physically contiguous ranges. To conclude the above, I did a handful of dtrace-based experiments. While one of the "temporary hangs" was happening, the following was the most common (non-idle) profiler stack: # dtrace -n 'profile-97{@[stack()]=3Dcount()}' ... kernel`vm_phys_alloc_contig+0x11d kernel`linux_alloc_pages+0x8f ttm.ko`ttm_pool_alloc+0x2cb ttm.ko`ttm_tt_populate+0xc5 ttm.ko`ttm_bo_handle_move_mem+0xc3 ttm.ko`ttm_bo_validate+0xb4 ttm.ko`ttm_bo_init_reserved+0x199 amdgpu.ko`amdgpu_bo_create+0x1eb amdgpu.ko`amdgpu_bo_create_user+0x21 amdgpu.ko`amdgpu_gem_create_ioctl+0x1e2 drm.ko`drm_ioctl_kernel+0xc6 drm.ko`drm_ioctl+0x2b5 kernel`linux_file_ioctl+0x312 kernel`kern_ioctl+0x255 kernel`sys_ioctl+0x123 kernel`amd64_syscall+0x109 kernel`0xffffffff80fe43eb The latency of vm_phys_alloc_contig (entry to return) is bimodal - with latencies in the single digit *milli*seconds during the "temporary hangs": # dtrace -n 'fbt::vm_phys_alloc_contig:entry{self->ts=3Dtimestamp}' -n 'fbt::vm_phys_alloc_contig:return/self->ts/{this->delta=3Dtimestamp-self->t= s; @=3Dquantize(this->delta);}' -n 'tick-1sec{printa(@)}' ... value ------------- Distribution ------------- count=20=20=20=20 256 | 0=20=20=20=20=20= =20=20=20 512 |@ 2606=20=20=20=20= =20 1024 |@@@@@@@@@ 18207=20=20=20=20 2048 |@ 2534=20=20=20=20= =20 4096 | 894=20=20=20=20= =20=20 8192 | 34=20=20=20=20= =20=20=20 16384 | 78=20=20=20=20= =20=20=20 32768 | 58=20=20=20=20= =20=20=20 65536 | 219=20=20=20=20= =20=20 131072 | 306=20=20=20=20= =20=20 262144 | 310=20=20=20=20= =20=20 524288 | 735=20=20=20=20= =20=20 1048576 | 174=20=20=20=20= =20=20 2097152 |@@ 4364=20=20=20=20= =20 4194304 |@@@@@@@@@@@@@@@@@@@@@@@@ 47475=20=20=20=20 8388608 |@ 1546=20=20=20=20= =20 16777216 | 2=20=20=20=20=20= =20=20=20 33554432 | 0=20=20=20=20=20 The number of pages being allocated: # dtrace -n 'fbt::vm_phys_alloc_contig:entry/arg1>1/{@=3Dquantize(arg1)}' -n 'tick-1sec{printa(@)}' ... value ------------- Distribution ------------- count=20=20=20=20 1 | 0=20=20=20=20=20= =20=20=20 2 |@@@ 15=20=20=20=20= =20=20=20 4 |@ 7=20=20=20=20=20= =20=20=20 8 |@@@ 16=20=20=20=20= =20=20=20 16 |@@ 10=20=20=20=20= =20=20=20 32 |@@ 10=20=20=20=20= =20=20=20 64 |@ 7=20=20=20=20=20= =20=20=20 128 |@@@ 12=20=20=20=20= =20=20=20 256 |@@@ 12=20=20=20=20= =20=20=20 512 |@@@@@@@@@@@@@@ 68=20=20=20=20= =20=20=20 1024 |@@@@@@@ 32=20=20=20=20= =20=20=20 2048 | 0=20=20=20=20=20= =20 I did a few more dtrace experiments, but they all point to the same thing -= a drm/amdgpu related ioctl wants 4MB of physically contiguous memory often en= ough to become a headache. 4MB isn't too much given than the system has 32GB of RAM, but physically contiguous takes a while to fulfill sometimes. The card: vgapci0@pci0:1:0:0: class=3D0x030000 rev=3D0xe7 hdr=3D0x00 vendor=3D0x1= 002 device=3D0x67df subvendor=3D0x1da2 subdevice=3D0xe353 vendor =3D 'Advanced Micro Devices, Inc. [AMD/ATI]' device =3D 'Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]' class =3D display subclass =3D VGA $ pkg info|grep -i amd=20=20=20=20=20=20=20=20=20=20=20=20=20 gpu-firmware-amd-kmod-aldebaran-20230625 Firmware modules for aldebaran AMD GPUs gpu-firmware-amd-kmod-arcturus-20230625 Firmware modules for arcturus AMD G= PUs gpu-firmware-amd-kmod-banks-20230625 Firmware modules for banks AMD GPUs gpu-firmware-amd-kmod-beige-goby-20230625 Firmware modules for beige_goby A= MD GPUs gpu-firmware-amd-kmod-bonaire-20230625 Firmware modules for bonaire AMD GPUs gpu-firmware-amd-kmod-carrizo-20230625 Firmware modules for carrizo AMD GPUs gpu-firmware-amd-kmod-cyan-skillfish2-20230625 Firmware modules for cyan_skillfish2 AMD GPUs gpu-firmware-amd-kmod-dimgrey-cavefish-20230625 Firmware modules for dimgrey_cavefish AMD GPUs gpu-firmware-amd-kmod-fiji-20230625 Firmware modules for fiji AMD GPUs gpu-firmware-amd-kmod-green-sardine-20230625 Firmware modules for green_sar= dine AMD GPUs gpu-firmware-amd-kmod-hainan-20230625 Firmware modules for hainan AMD GPUs gpu-firmware-amd-kmod-hawaii-20230625 Firmware modules for hawaii AMD GPUs gpu-firmware-amd-kmod-kabini-20230625 Firmware modules for kabini AMD GPUs gpu-firmware-amd-kmod-kaveri-20230625 Firmware modules for kaveri AMD GPUs gpu-firmware-amd-kmod-mullins-20230625 Firmware modules for mullins AMD GPUs gpu-firmware-amd-kmod-navi10-20230625 Firmware modules for navi10 AMD GPUs gpu-firmware-amd-kmod-navi12-20230625 Firmware modules for navi12 AMD GPUs gpu-firmware-amd-kmod-navi14-20230625 Firmware modules for navi14 AMD GPUs gpu-firmware-amd-kmod-navy-flounder-20230625 Firmware modules for navy_flou= nder AMD GPUs gpu-firmware-amd-kmod-oland-20230625 Firmware modules for oland AMD GPUs gpu-firmware-amd-kmod-picasso-20230625 Firmware modules for picasso AMD GPUs gpu-firmware-amd-kmod-pitcairn-20230625 Firmware modules for pitcairn AMD G= PUs gpu-firmware-amd-kmod-polaris10-20230625 Firmware modules for polaris10 AMD GPUs gpu-firmware-amd-kmod-polaris11-20230625 Firmware modules for polaris11 AMD GPUs gpu-firmware-amd-kmod-polaris12-20230625 Firmware modules for polaris12 AMD GPUs gpu-firmware-amd-kmod-raven-20230625 Firmware modules for raven AMD GPUs gpu-firmware-amd-kmod-raven2-20230625 Firmware modules for raven2 AMD GPUs gpu-firmware-amd-kmod-renoir-20230625 Firmware modules for renoir AMD GPUs gpu-firmware-amd-kmod-si58-20230625 Firmware modules for si58 AMD GPUs gpu-firmware-amd-kmod-sienna-cichlid-20230625 Firmware modules for sienna_cichlid AMD GPUs gpu-firmware-amd-kmod-stoney-20230625 Firmware modules for stoney AMD GPUs gpu-firmware-amd-kmod-tahiti-20230625 Firmware modules for tahiti AMD GPUs gpu-firmware-amd-kmod-tonga-20230625 Firmware modules for tonga AMD GPUs gpu-firmware-amd-kmod-topaz-20230625 Firmware modules for topaz AMD GPUs gpu-firmware-amd-kmod-vangogh-20230625 Firmware modules for vangogh AMD GPUs gpu-firmware-amd-kmod-vega10-20230625 Firmware modules for vega10 AMD GPUs gpu-firmware-amd-kmod-vega12-20230625 Firmware modules for vega12 AMD GPUs gpu-firmware-amd-kmod-vega20-20230625 Firmware modules for vega20 AMD GPUs gpu-firmware-amd-kmod-vegam-20230625 Firmware modules for vegam AMD GPUs gpu-firmware-amd-kmod-verde-20230625 Firmware modules for verde AMD GPUs gpu-firmware-amd-kmod-yellow-carp-20230625 Firmware modules for yellow_carp= AMD GPUs suitesparse-amd-3.3.0 Symmetric approximate minimum degree suitesparse-camd-3.3.0 Symmetric approximate minimum degree suitesparse-ccolamd-3.3.0 Constrained column approximate minimum degree ordering suitesparse-colamd-3.3.0 Column approximate minimum degree ordering algorithm webcamd-5.17.1.2_1 Port of Linux USB webcam and DVB drivers into userspace xf86-video-amdgpu-22.0.0_1 X.Org amdgpu display driver $ pkg info|grep -i drm drm-515-kmod-5.15.118_3 DRM drivers modules drm-kmod-20220907_1 Metaport of DRM modules for the linuxkpi-bas= ed KMS components gpu-firmware-kmod-20230210_1,1 Firmware modules for the drm-kmod drivers libdrm-2.4.120_1,1 Direct Rendering Manager library and headers --=20 You are receiving this mail because: You are the assignee for the bug.=