Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Aug 2025 07:18:44 +0200
From:      Stephan Althaus <Stephan.Althaus@Duedinghausen.eu>
To:        virtualization@freebsd.org
Subject:   Re: GPU Passthrough on FreeBSD 14.3(AMD Radeon RX 6700 XT and Debian Linux 12.11)
Message-ID:  <1117706a-6680-4f00-8728-16ae195f02ca@Duedinghausen.eu>
In-Reply-To: <43c96438-6068-487d-b1ea-583dddf0f6e8@ambient-md.com>
References:  <43c96438-6068-487d-b1ea-583dddf0f6e8@ambient-md.com>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On 8/27/25 05:51, Petru Garstea wrote:
>
> Greetings,
>
> I’m running a *Debian Linux 12.11 VM on FreeBSD 14.3* using *bhyve*.
> Inside the VM, I’ve deployed the *Docker engine* with *Ollama 
> configured for ROCm support*.
>
> However, when executing an LLM, the *GPU fails to initialize 
> correctly*, causing the process to fail.
> Please note on the bare metal this setup works fine.
>
> The full log of this behavior is included below.
>
> ---
>
>> kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
>> kernel: [drm] PSP is resuming...
>> kernel: [drm] reserve 0xa00000 from 0x82fd000000 for PSP TMR
>> kernel: amdgpu 0000:00:01.0: amdgpu: RAS: optional ras ta ucode is 
>> not available
>> kernel: amdgpu 0000:00:01.0: amdgpu: SECUREDISPLAY: securedisplay ta 
>> ucode is not available
>> kernel: amdgpu 0000:00:01.0: amdgpu: SMU is resuming...
>> kernel: amdgpu 0000:00:01.0: amdgpu: smu driver if version = 
>> 0x0000000e, smu fw if version = 0x00000012, smu fw program = 0, 
>> version = 0x00413900 (65.57.0)
>> kernel: amdgpu 0000:00:01.0: amdgpu: SMU driver if version not matched
>> kernel: amdgpu 0000:00:01.0: amdgpu: use vbios provided pptable
>> kernel: amdgpu 0000:00:01.0: amdgpu: SMU is resumed successfully!
>> kernel: [drm] DMUB hardware initialized: version=0x02020017
>> kernel: [drm] kiq ring mec 2 pipe 1 q 0
>> kernel: [drm] VCN decode and encode initialized successfully(under 
>> DPG Mode).
>> kernel: [drm] JPEG decode initialized successfully.
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 
>> on hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.0.0 uses VM inv eng 
>> 1 on hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.1.0 uses VM inv eng 
>> 4 on hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.2.0 uses VM inv eng 
>> 5 on hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.3.0 uses VM inv eng 
>> 6 on hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.0.1 uses VM inv eng 
>> 7 on hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.1.1 uses VM inv eng 
>> 8 on hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.2.1 uses VM inv eng 
>> 9 on hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.3.1 uses VM inv eng 
>> 10 on hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 
>> 11 on hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring sdma0 uses VM inv eng 12 on 
>> hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring sdma1 uses VM inv eng 13 on 
>> hub 0
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 
>> on hub 1
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 
>> 1 on hub 1
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 
>> 4 on hub 1
>> kernel: amdgpu 0000:00:01.0: amdgpu: ring jpeg_dec uses VM inv eng 5 
>> on hub 1
>> kernel: amdgpu 0000:00:01.0: [drm] Cannot find any crtc or sizes
>> kernel: amdgpu: qcm fence wait loop timeout expired
>> kernel: amdgpu: The cp might be in an unrecoverable state due to an 
>> unsuccessful queues preemption
>> kernel: amdgpu: Pasid 0x8002 DQM create queue type 0 failed. ret -62
>> kernel: amdgpu 0000:00:01.0: amdgpu: GPU reset begin!
>> kernel: amdgpu: Failed to suspend process 0x8002
>> kernel: amdgpu: Failed to suspend process 0x8001
>> kernel: amdgpu 0000:00:01.0: amdgpu: free PSP TMR buffer
>> kernel: amdgpu 0000:00:01.0: amdgpu: MODE1 reset
>> kernel: amdgpu 0000:00:01.0: amdgpu: GPU mode1 reset
>> kernel: amdgpu 0000:00:01.0: amdgpu: GPU smu mode1 reset
>> kernel: amdgpu 0000:00:01.0: amdgpu: GPU reset succeeded, trying to 
>> resume
>> kernel: clocksource: Long readout interval, skipping watchdog check: 
>> cs_nsec: 12622536057 wd_nsec: 12613480925
>> kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
>> kernel: [drm] VRAM is lost due to GPU reset!
>> kernel: [drm] PSP is resuming...
>> kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
>> kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
>> kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP 
>> block <psp> failed -62
>> kernel: amdgpu 0000:00:01.0: amdgpu: GPU reset(1) failed
>> kernel: amdgpu: qcm fence wait loop timeout expired
>> kernel: amdgpu: The cp might be in an unrecoverable state due to an 
>> unsuccessful queues preemption
>> kernel: amdgpu 0000:00:01.0: amdgpu: GPU reset end with ret = -62
>> kernel: amdgpu 0000:00:01.0: amdgpu: GPU reset begin!
>> kernel: amdgpu 0000:00:01.0: amdgpu: Failed to disallow df cstate
>
> Regards,
> Petru
>
Hello!

Before you start docker, are you able to verify that the GPU is actually 
working in the vm?

How did you verify ? (for AMD i don't know tho tooling)

Regards,
Stephan


[-- Attachment #2 --]
<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">On 8/27/25 05:51, Petru Garstea wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:43c96438-6068-487d-b1ea-583dddf0f6e8@ambient-md.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <p>Greetings,</p>
      <p data-start="65" data-end="237">I’m running a <strong
          data-start="79" data-end="120">Debian Linux 12.11 VM on
          FreeBSD 14.3</strong> using <strong data-start="127"
          data-end="136">bhyve</strong>.<br data-start="137"
          data-end="140">
        Inside the VM, I’ve deployed the <strong data-start="173"
          data-end="190">Docker engine</strong> with <strong
          data-start="196" data-end="234" data-is-only-node="">Ollama
          configured for ROCm support</strong>.</p>
      <p data-start="239" data-end="395">However, when executing an LLM,
        the <strong data-start="275" data-end="312">GPU fails to
          initialize correctly</strong>, causing the process to fail.<br>
        Please note on the bare metal this setup works fine.<br>
        <br data-start="342" data-end="345">
        The full log of this behavior is included below.</p>
      <p>---<br>
      </p>
      <blockquote type="cite">kernel: [drm] PCIE GART of 512M enabled
        (table at 0x0000008000000000).<br>
        kernel: [drm] PSP is resuming...<br>
        kernel: [drm] reserve 0xa00000 from 0x82fd000000 for PSP TMR<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: RAS: optional ras ta ucode
        is not available<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: SECUREDISPLAY:
        securedisplay ta ucode is not available<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: SMU is resuming...<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: smu driver if version =
        0x0000000e, smu fw if version = 0x00000012, smu fw program = 0,
        version = 0x00413900 (65.57.0)<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: SMU driver if version not
        matched<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: use vbios provided pptable<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: SMU is resumed
        successfully!<br>
        kernel: [drm] DMUB hardware initialized: version=0x02020017<br>
        kernel: [drm] kiq ring mec 2 pipe 1 q 0<br>
        kernel: [drm] VCN decode and encode initialized
        successfully(under DPG Mode).<br>
        kernel: [drm] JPEG decode initialized successfully.<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring gfx_0.0.0 uses VM inv
        eng 0 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.0.0 uses VM inv
        eng 1 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.1.0 uses VM inv
        eng 4 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.2.0 uses VM inv
        eng 5 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.3.0 uses VM inv
        eng 6 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.0.1 uses VM inv
        eng 7 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.1.1 uses VM inv
        eng 8 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.2.1 uses VM inv
        eng 9 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring comp_1.3.1 uses VM inv
        eng 10 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring kiq_2.1.0 uses VM inv
        eng 11 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring sdma0 uses VM inv eng
        12 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring sdma1 uses VM inv eng
        13 on hub 0<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring vcn_dec_0 uses VM inv
        eng 0 on hub 1<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring vcn_enc_0.0 uses VM
        inv eng 1 on hub 1<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring vcn_enc_0.1 uses VM
        inv eng 4 on hub 1<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: ring jpeg_dec uses VM inv
        eng 5 on hub 1<br>
        kernel: amdgpu 0000:00:01.0: [drm] Cannot find any crtc or sizes<br>
        kernel: amdgpu: qcm fence wait loop timeout expired<br>
        kernel: amdgpu: The cp might be in an unrecoverable state due to
        an unsuccessful queues preemption<br>
        kernel: amdgpu: Pasid 0x8002 DQM create queue type 0 failed. ret
        -62<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: GPU reset begin!<br>
        kernel: amdgpu: Failed to suspend process 0x8002<br>
        kernel: amdgpu: Failed to suspend process 0x8001<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: free PSP TMR buffer<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: MODE1 reset<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: GPU mode1 reset<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: GPU smu mode1 reset<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: GPU reset succeeded, trying
        to resume<br>
        kernel: clocksource: Long readout interval, skipping watchdog
        check: cs_nsec: 12622536057 wd_nsec: 12613480925<br>
        kernel: [drm] PCIE GART of 512M enabled (table at
        0x0000008000000000).<br>
        kernel: [drm] VRAM is lost due to GPU reset!<br>
        kernel: [drm] PSP is resuming...<br>
        kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring
        failed!<br>
        kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed<br>
        kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume
        of IP block &lt;psp&gt; failed -62<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: GPU reset(1) failed<br>
        kernel: amdgpu: qcm fence wait loop timeout expired<br>
        kernel: amdgpu: The cp might be in an unrecoverable state due to
        an unsuccessful queues preemption<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: GPU reset end with ret =
        -62<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: GPU reset begin!<br>
        kernel: amdgpu 0000:00:01.0: amdgpu: Failed to disallow df
        cstate</blockquote>
      <br>
      <p>Regards,<br>
        Petru</p>
    </blockquote>
    <p>Hello!</p>
    <p>Before you start docker, are you able to verify that the GPU is
      actually working in the vm?</p>
    <p>How did you verify ? (for AMD i don't know tho tooling)</p>
    <p>Regards,<br>
      Stephan</p>
    <p><br>
    </p>
  </body>
</html>
help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1117706a-6680-4f00-8728-16ae195f02ca>