Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 03 Apr 2019 23:36:36 +0000
From:      Robert Crowston <crowston@protonmail.com>
To:        Robert Crowston <crowston@protonmail.com>
Cc:        "freebsd-virtualization@freebsd.org" <freebsd-virtualization@freebsd.org>
Subject:   Re: GPU passthrough: mixed success on Linux, not yet on Windows
Message-ID:  <_jA2UXvV5kqhYMOMU9WYEVi0YyChL0Z7YLfziC4KWkNfijzZxCbeuJEE7T7zV9aEIwH_HpEc-a_fjokPENr--i6JW6NpcDOCvsKYjpi5NXU=@protonmail.com>
In-Reply-To: <H0Gbov17YtZC1-Ao1YkjZ-nuOqPv4LPggc_mni3cS8WWOjlSLBAfOGGPf4aZEpOBiC5PAUGg6fkgeutcLrdbmXNO5QfaxFtK_ANn-Nrklws=@protonmail.com>
References:  <H0Gbov17YtZC1-Ao1YkjZ-nuOqPv4LPggc_mni3cS8WWOjlSLBAfOGGPf4aZEpOBiC5PAUGg6fkgeutcLrdbmXNO5QfaxFtK_ANn-Nrklws=@protonmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> The hack I had to make: I found that many instructions to access memory-m=
apped PCI BARs are not being executed on the CPU in guest mode but are bein=
g passed back for emulation in the hypervisor.

Update on this: I found that by mapping the BARs within the lower 4 GB of t=
he guest's address space I am able to start X under Linux without other wei=
rd hacks. The size of the BAR that was causing difficulty is 128 MB. The vm=
_map_pptdev_mmio() call apparently succeeds in mapping the memory range but=
 the processor then faults on any guest instruction that accesses this rang=
e.

You can change the memory region either by bumping the size of a "small req=
uest" (initially 32 MB) in bhyve/pci_emul.c around line 638, or by tweaking=
 the value of the macro constant PCI_EMUL_MEMBASE64 in the same file. I als=
o tried setting PCI_EMUL_MEMBASE64 to other low values (like 32 GB) but see=
ms like it has to be below 4 GB for Linux to be happy.

=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me=
ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90
On Sunday, 17 March 2019 16:22, Robert Crowston via freebsd-virtualization =
<freebsd-virtualization@freebsd.org> wrote:

> Hi folks, this is my first post to the group. Apologies for length.
>
> I've been experimenting with GPU passthrough on bhyve. For background, th=
e host system is FreeBSD 12.0-RELEASE on an AMD Ryzen 1700 CPU @ 3.8 GHz, 3=
2 GB of ECC RAM, with two nVidia GPUs. I'm working with a Linux Debian 9 gu=
est and a Windows Server 2019 (desktop experience installed) guest. I also =
have a USB controller passed-through for bluetooth and keyboard.
>
> With some unpleasant hacks I have succeeded in starting X on the Linux gu=
est, passing-through an nVidia GT 710 under the nouveau driver. I can run t=
he "mate" desktop and glxgears, both of which are smooth at 4K. The Unity H=
eaven benchmark runs at an embarrassing 0.1 fps, and 2160p x264 video in VL=
C runs at about 5 fps. Neither appears to be CPU-bound in the host or the g=
uest.
>
> The hack I had to make: I found that many instructions to access memory-m=
apped PCI BARs are not being executed on the CPU in guest mode but are bein=
g passed back for emulation in the hypervisor. This causes an assertion to =
fail inside passthru_write() in pci_passthru.c ["pi->pi_bar[baridx].type =
=3D=3D PCIBAR_IO"] because it does not expect to perform memory-mapped IO f=
or the guest. Examining the to-be-emulated instructions in vmexit_inst_emul=
() {e.g., movl (%rdi), %eax}, they look benign to me, and I have no explana=
tion for why the CPU refused to execute them in guest mode.
>
> As an amateur work-around, I removed the assertion and instead I obtain t=
he desired offset into the guest's BAR, calculate what that guest address t=
ranslates to in the host's address space, open(2) /dev/mem, mmap(2) over to=
 that address, and perform the write directly. I do a similar trick in pass=
thru_read(). Ugly, slow, but functional.
>
> This code path is accessed continuously whether or not X is running, with=
 an increase in activity when running anything GPU-heavy. Always to bar 1, =
and mostly around the same offsets. I added some logging of this event. It =
runs at about 100 lines per second while playing video. An excerpt is:
> ...
> Unexpected out-of-vm passthrough write #492036 to bar 1 at offset 41100.
> Unexpected out-of-vm passthrough write #492037 to bar 1 at offset 41100.
> Unexpected out-of-vm passthrough read #276162 to bar 1 at offset 561280.
> Unexpected out-of-vm passthrough write #492038 to bar 1 at offset 38028.
> Unexpected out-of-vm passthrough write #492039 to bar 1 at offset 38028.
> Unexpected out-of-vm passthrough read #276163 to bar 1 at offset 561184.
> Unexpected out-of-vm passthrough read #276164 to bar 1 at offset 561184.
> Unexpected out-of-vm passthrough read #276165 to bar 1 at offset 561184.
> Unexpected out-of-vm passthrough read #276166 to bar 1 at offset 561184.
> ...
>
> So my question here is,
>
> 1.  How do I diagnose why the instructions are not being executed in gues=
t mode?
>
>     Some other problems:
>
> 2.  Once the virtual machine is shut down, the passed-through GPU doesn't=
 get turned off. Whatever message was on the screen in the final throes of =
Linux's shutdown stays there. Maybe there is a specific detach command whic=
h bhyve or nouveau hasn't yet implemented? Alternatively, maybe I could exp=
loit some power management feature to reset the card when bhyve exits.
> 3.  It is not possible to reboot the guest and then start X again without=
 an intervening host reboot. The text console works fine. Xorg.0.log has a =
message like
>     (EE) [drm] Failed to open DRM device for pci:0000:00:06.0: -19
>     (EE) open /dev/dri/card0: No such file or directory
>     dmesg is not very helpful either.[0] I suspect that this is related t=
o problem (2).
>
> 4.  There is a known bug in the version of the Xorg server that ships wit=
h Debian 9, where the switch from an animated mouse cursor back to a static=
 cursor causes the X server to sit in a busy loop of gradually increasing s=
tack depth, if the GPU takes too long to communicate with the driver.[1] Fo=
r me, this consistently happens after I type my password into the Debian lo=
gin dialog box and eventually (~ 120 minutes) locks up the host by eating a=
ll the swap. A work-around is to replace the guest's animated cursors with =
static cursors. The bug is fixed in newer versions of X, but I haven't test=
ed whether their fix works for me yet.
> 5.  The GPU doesn't come to life until the nouveau driver kicks in. What =
is special about the driver? Why doesn't the UEFI open the GPU and send it =
output before the boot? Any idea if the problem is on the UEFI side or the =
hypervisor side?
> 6.  On Windows, the way Windows probes multi-BAR devices seems to be inco=
nsistent with bhyve's model for storing io memory mappings. Specifically, I=
 believe Windows assigns the 0xffffffff sentinel to all BARs on a device in=
 one shot, then reads them back and assigns the true addresses afterwards. =
However, bhyve sees the multiple 0xffffffff assignments to different BARs a=
s a clash and errors out on the second BAR probe. I removed most of the mmi=
o_rb_tree error handling in mem.c and this is sufficient for Windows to boo=
t, and detect and correctly identify the GPU. (A better solution might be t=
o handle the initial 0xffffffff write as a special case.) I can then instal=
l the official nVidia drivers without problem over Remote Desktop. However,=
 the GPU never springs into life: I am stuck with a "Windows has stopped th=
is device because it has reported problems. (Code 43)" error in the device =
manager, a blank screen, and not much else to go on.
>
>     Is it worth me continuing to hack away at these problems---of course =
I'm happy to share anything I come up with---or is there an official soluti=
on to GPU support in the pipe about to make my efforts redundant :)?
>
>     Thanks,
>     Robert Crowston.
>
>
> Footnotes
>
> [0] Diff'ing dmesg after successful GPU initialization (+) and after fail=
ure (-), and cutting out some lines that aren't relevant:
> nouveau 0000:00:06.0: bios: version 80.28.a6.00.10
> +nouveau 0000:00:06.0: priv: HUB0: 085014 ffffffff (1f70820b)
> nouveau 0000:00:06.0: fb: 1024 MiB DDR3
> @@ -466,24 +467,17 @@
> nouveau 0000:00:06.0: DRM: DCB conn 00: 00001031
> nouveau 0000:00:06.0: DRM: DCB conn 01: 00002161
> nouveau 0000:00:06.0: DRM: DCB conn 02: 00000200
> -nouveau 0000:00:06.0: disp: chid 0 mthd 0000 data 00000400 00001000 0000=
0002
> -nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drive=
rs/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:88/gf119_disp_dmac_init()!
> -nouveau 0000:00:06.0: disp: ch 1 init: c207009b
> -nouveau: DRM:00000000:0000927c: init failed with -16
> -nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drive=
rs/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:54/gf119_disp_dmac_fini()!
> -nouveau 0000:00:06.0: disp: ch 1 fini: c2071088
> -nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drive=
rs/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:54/gf119_disp_dmac_fini()!
> -nouveau 0000:00:06.0: disp: ch 1 fini: c2071088
> +[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> +[drm] Driver supports precise vblank timestamp query.
> +nouveau 0000:00:06.0: DRM: MM: using COPY for buffer copies
> +nouveau 0000:00:06.0: DRM: allocated 1920x1080 fb: 0x60000, bo ffff96fdb=
39a1800
> +fbcon: nouveaufb (fb0) is primary device
> -nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drive=
rs/gpu/drm/nouveau/nvkm/engine/disp/coregf119.c:187/gf119_disp_core_fini()
> -nouveau 0000:00:06.0: disp: core fini: 8d0f0088
> -[TTM] Finalizing pool allocator
> -[TTM] Finalizing DMA pool allocator
> -[TTM] Zone kernel: Used memory at exit: 0 kiB
> -[TTM] Zone dma32: Used memory at exit: 0 kiB
> -nouveau: probe of 0000:00:06.0 failed with error -16
> +Console: switching to colour frame buffer device 240x67
> +nouveau 0000:00:06.0: fb0: nouveaufb frame buffer device
> +[drm] Initialized nouveau 1.3.1 20120801 for 0000:00:06.0 on minor 0
>
> [1] https://devtalk.nvidia.com/default/topic/1028172/linux/titan-v-ubuntu=
-16-04lts-and-387-34-driver-crashes-badly/post/5230898/#5230898
>
> freebsd-virtualization@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
> To unsubscribe, send any mail to "freebsd-virtualization-unsubscribe@free=
bsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?_jA2UXvV5kqhYMOMU9WYEVi0YyChL0Z7YLfziC4KWkNfijzZxCbeuJEE7T7zV9aEIwH_HpEc-a_fjokPENr--i6JW6NpcDOCvsKYjpi5NXU=>