Date: Wed, 24 Sep 2025 13:31:48 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 289813] Vulkan: running and inferencing with "koboldcpp" or "llama.cpp" using the Vulkan backend locks up the GPU... Message-ID: <bug-289813-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=289813 Bug ID: 289813 Summary: Vulkan: running and inferencing with "koboldcpp" or "llama.cpp" using the Vulkan backend locks up the GPU... Product: Base System Version: 15.0-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: nbe@renzel.net Hi, while using and inferencing with "koboldcpp" or "llama.cpp" using the Vulkan backend will lock up my iGPU Radeon 780M eventually. Same happened with my old Radeon RX460. Using the integrated benchmarks of the two mentioned LLM engines a few times in a row would trigger the lock-up faster. ------------------------------- SNIP ------------------------------- Sep 24 15:16:35 asbach kernel: [drm ERROR :amdgpu_job_timedout] ring gfx_0.0.0 timeout, signaled seq=31038, emitted seq=31040 Sep 24 15:16:35 asbach kernel: [drm ERROR :amdgpu_job_timedout] Process information: process pid 101072 thread pid 101072 Sep 24 15:16:35 asbach kernel: drmn0: GPU reset begin! Sep 24 15:16:36 asbach kernel: [drm ERROR :mes_v11_0_submit_pkt_and_poll_completion] MES failed to response msg=3 Sep 24 15:16:36 asbach kernel: [drm ERROR :amdgpu_mes_unmap_legacy_queue] failed to unmap legacy queue Sep 24 15:16:36 asbach kernel: [drm ERROR :mes_v11_0_submit_pkt_and_poll_completion] MES failed to response msg=3 Sep 24 15:16:36 asbach kernel: [drm ERROR :amdgpu_mes_unmap_legacy_queue] failed to unmap legacy queue Sep 24 15:16:36 asbach kernel: [drm ERROR :mes_v11_0_submit_pkt_and_poll_completion] MES failed to response msg=3 Sep 24 15:16:36 asbach kernel: [drm ERROR :amdgpu_mes_unmap_legacy_queue] failed to unmap legacy queue Sep 24 15:16:36 asbach kernel: [drm ERROR :mes_v11_0_submit_pkt_and_poll_completion] MES failed to response msg=3 Sep 24 15:16:36 asbach kernel: [drm ERROR :amdgpu_mes_unmap_legacy_queue] failed to unmap legacy queue Sep 24 15:16:36 asbach kernel: [drm ERROR :mes_v11_0_submit_pkt_and_poll_completion] MES failed to response msg=3 Sep 24 15:16:36 asbach kernel: [drm ERROR :amdgpu_mes_unmap_legacy_queue] failed to unmap legacy queue Sep 24 15:16:36 asbach kernel: [drm ERROR :mes_v11_0_submit_pkt_and_poll_completion] MES failed to response msg=3 Sep 24 15:16:36 asbach kernel: [drm ERROR :amdgpu_mes_unmap_legacy_queue] failed to unmap legacy queue Sep 24 15:16:36 asbach kernel: [drm ERROR :mes_v11_0_submit_pkt_and_poll_completion] MES failed to response msg=3 Sep 24 15:16:36 asbach kernel: [drm ERROR :amdgpu_mes_unmap_legacy_queue] failed to unmap legacy queue Sep 24 15:16:36 asbach kernel: [drm ERROR :mes_v11_0_submit_pkt_and_poll_completion] MES failed to response msg=3 Sep 24 15:16:36 asbach kernel: [drm ERROR :amdgpu_mes_unmap_legacy_queue] failed to unmap legacy queue Sep 24 15:16:37 asbach kernel: [drm ERROR :mes_v11_0_submit_pkt_and_poll_completion] MES failed to response msg=3 Sep 24 15:16:37 asbach kernel: [drm ERROR :amdgpu_mes_unmap_legacy_queue] failed to unmap legacy queue Sep 24 15:16:37 asbach kernel: drmn0: MODE2 reset Sep 24 15:16:37 asbach kernel: drmn0: GPU reset succeeded, trying to resume Sep 24 15:16:37 asbach kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000). Sep 24 15:16:37 asbach kernel: drmn0: SMU is resuming... Sep 24 15:16:37 asbach kernel: drmn0: SMU is resumed successfully! Sep 24 15:16:37 asbach kernel: [drm] DMUB hardware initialized: version=0x08001B00 Sep 24 15:16:37 asbach kernel: WARNING !(0) failed at /usr/ports/graphics/drm-66-kmod/work/drm-kmod-drm_v6.6.25_6/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c:1530 Sep 24 15:16:37 asbach kernel: [drm] kiq ring mec 3 pipe 1 q 0 Sep 24 15:16:37 asbach kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode). Sep 24 15:16:37 asbach kernel: drmn0: [drm] jpeg_v4_0_hw_initdrmn0: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 Sep 24 15:16:37 asbach kernel: drmn0: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Sep 24 15:16:37 asbach kernel: drmn0: ring comp_1.1.0 uses VM inv eng 4 on hub 0 Sep 24 15:16:37 asbach kernel: drmn0: ring comp_1.2.0 uses VM inv eng 6 on hub 0 Sep 24 15:16:37 asbach kernel: drmn0: ring comp_1.3.0 uses VM inv eng 7 on hub 0 Sep 24 15:16:37 asbach kernel: drmn0: ring comp_1.0.1 uses VM inv eng 8 on hub 0 Sep 24 15:16:37 asbach kernel: drmn0: ring comp_1.1.1 uses VM inv eng 9 on hub 0 Sep 24 15:16:37 asbach kernel: drmn0: ring comp_1.2.1 uses VM inv eng 10 on hub 0 Sep 24 15:16:37 asbach kernel: drmn0: ring comp_1.3.1 uses VM inv eng 11 on hub 0 Sep 24 15:16:37 asbach kernel: drmn0: ring sdma0 uses VM inv eng 12 on hub 0 Sep 24 15:16:37 asbach kernel: drmn0: ring vcn_unified_0 uses VM inv eng 0 on hub 8 Sep 24 15:16:37 asbach kernel: drmn0: ring jpeg_dec uses VM inv eng 1 on hub 8 Sep 24 15:16:37 asbach kernel: drmn0: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0 Sep 24 15:16:37 asbach kernel: drmn0: recover vram bo from shadow start Sep 24 15:16:37 asbach kernel: drmn0: recover vram bo from shadow done Sep 24 15:16:37 asbach kernel: [drm ERROR :amdgpu_cs_ioctl] Failed to initialize parser -85! ------------------------------- SNIP ------------------------------- The mouse cursor still is movable, but everything else on the screens is frozen. No VT switches are possible anymore. Only a hardware reset will help and recover the GPU. If you need any more infos or instructed debugs, let me know. Would a "truss" output help? Thanks in advance and regards, Nils -- You are receiving this mail because: You are the assignee for the bug.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-289813-227>
