From owner-freebsd-virtualization@freebsd.org  Sun Mar 17 16:22:35 2019
Return-Path: <owner-freebsd-virtualization@freebsd.org>
Delivered-To: freebsd-virtualization@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A7067154127E
 for <freebsd-virtualization@mailman.ysv.freebsd.org>;
 Sun, 17 Mar 2019 16:22:35 +0000 (UTC)
 (envelope-from crowston@protonmail.com)
Received: from mail1.protonmail.ch (mail1.protonmail.ch [185.70.40.18])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "*.protonmail.ch",
 Issuer "SwissSign Server Silver CA 2014 - G22" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E708C86516
 for <freebsd-virtualization@freebsd.org>; Sun, 17 Mar 2019 16:22:34 +0000 (UTC)
 (envelope-from crowston@protonmail.com)
Date: Sun, 17 Mar 2019 16:22:29 +0000
To: "freebsd-virtualization@freebsd.org" <freebsd-virtualization@freebsd.org>
From: Robert Crowston <crowston@protonmail.com>
Reply-To: Robert Crowston <crowston@protonmail.com>
Subject: GPU passthrough: mixed success on Linux, not yet on Windows
Message-ID: <H0Gbov17YtZC1-Ao1YkjZ-nuOqPv4LPggc_mni3cS8WWOjlSLBAfOGGPf4aZEpOBiC5PAUGg6fkgeutcLrdbmXNO5QfaxFtK_ANn-Nrklws=@protonmail.com>
Feedback-ID: 2OVbcR1yHYpdkD8cgQllkFwcuMVZg_LiVMMPvptooFDfHD_03MuQO4ZaF626jWHZYFEhNR2cmIbZ53j4QGWMBQ==:Ext:ProtonMail
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-1.2 required=7.0 tests=ALL_TRUSTED,DKIM_SIGNED,
 DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM autolearn=ham
 autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on mail.protonmail.ch
X-Rspamd-Queue-Id: E708C86516
X-Spamd-Bar: -------
X-Spamd-Result: default: False [-7.76 / 15.00];
 HAS_REPLYTO(0.00)[crowston@protonmail.com];
 R_SPF_ALLOW(-0.20)[+ip4:185.70.40.0/24];
 FREEMAIL_FROM(0.00)[protonmail.com];
 DKIM_TRACE(0.00)[protonmail.com:+];
 DMARC_POLICY_ALLOW(-0.50)[protonmail.com,quarantine];
 MX_GOOD(-0.01)[cached: mailsec.protonmail.ch];
 NEURAL_HAM_SHORT(-0.98)[-0.981,0]; RCVD_COUNT_ZERO(0.00)[0];
 FROM_EQ_ENVFROM(0.00)[];
 IP_SCORE(-3.67)[ip: (-9.48), ipnet: 185.70.40.0/24(-4.89), asn: 19905(-3.92),
 country: US(-0.07)]; MIME_TRACE(0.00)[0:+];
 FREEMAIL_ENVFROM(0.00)[protonmail.com];
 ASN(0.00)[asn:19905, ipnet:185.70.40.0/24, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 RCVD_IN_DNSWL_LOW(-0.10)[18.40.70.185.list.dnswl.org : 127.0.5.1];
 ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 R_DKIM_ALLOW(-0.20)[protonmail.com:s=default];
 REPLYTO_EQ_FROM(0.00)[]; FROM_HAS_DN(0.00)[];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 MIME_GOOD(-0.10)[text/plain];
 FREEMAIL_REPLYTO(0.00)[protonmail.com];
 RCPT_COUNT_ONE(0.00)[1]; TO_DN_EQ_ADDR_ALL(0.00)[];
 RCVD_TLS_ALL(0.00)[]
X-BeenThere: freebsd-virtualization@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion of various virtualization techniques FreeBSD supports."
 <freebsd-virtualization.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-virtualization>, 
 <mailto:freebsd-virtualization-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-virtualization/>
List-Post: <mailto:freebsd-virtualization@freebsd.org>
List-Help: <mailto:freebsd-virtualization-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization>, 
 <mailto:freebsd-virtualization-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Mar 2019 16:22:35 -0000

Hi folks, this is my first post to the group. Apologies for length.

I've been experimenting with GPU passthrough on bhyve. For background, the =
host system is FreeBSD 12.0-RELEASE on an AMD Ryzen 1700 CPU @ 3.8 GHz, 32 =
GB of ECC RAM, with two nVidia GPUs. I'm working with a Linux Debian 9 gues=
t and a Windows Server 2019 (desktop experience installed) guest. I also ha=
ve a USB controller passed-through for bluetooth and keyboard.

With some unpleasant hacks I have succeeded in starting X on the Linux gues=
t, passing-through an nVidia GT 710 under the nouveau driver. I can run the=
 "mate" desktop and glxgears, both of which are smooth at 4K. The Unity Hea=
ven benchmark runs at an embarrassing 0.1 fps, and 2160p x264 video in VLC =
runs at about 5 fps. Neither appears to be CPU-bound in the host or the gue=
st.

The hack I had to make: I found that many instructions to access memory-map=
ped PCI BARs are not being executed on the CPU in guest mode but are being =
passed back for emulation in the hypervisor. This causes an assertion to fa=
il inside passthru_write() in pci_passthru.c ["pi->pi_bar[baridx].type =3D=
=3D PCIBAR_IO"] because it does not expect to perform memory-mapped IO for =
the guest. Examining the to-be-emulated instructions in vmexit_inst_emul() =
{e.g., movl (%rdi), %eax}, they look benign to me, and I have no explanatio=
n for why the CPU refused to execute them in guest mode.

As an amateur work-around, I removed the assertion and instead I obtain the=
 desired offset into the guest's BAR, calculate what that guest address tra=
nslates to in the host's address space, open(2) /dev/mem, mmap(2) over to t=
hat address, and perform the write directly. I do a similar trick in passth=
ru_read(). Ugly, slow, but functional.

This code path is accessed continuously whether or not X is running, with a=
n increase in activity when running anything GPU-heavy. Always to bar 1, an=
d mostly around the same offsets. I added some logging of this event. It ru=
ns at about 100 lines per second while playing video. An excerpt is:
...
Unexpected out-of-vm passthrough write #492036 to bar 1 at offset 41100.
Unexpected out-of-vm passthrough write #492037 to bar 1 at offset 41100.
Unexpected out-of-vm passthrough read #276162 to bar 1 at offset 561280.
Unexpected out-of-vm passthrough write #492038 to bar 1 at offset 38028.
Unexpected out-of-vm passthrough write #492039 to bar 1 at offset 38028.
Unexpected out-of-vm passthrough read #276163 to bar 1 at offset 561184.
Unexpected out-of-vm passthrough read #276164 to bar 1 at offset 561184.
Unexpected out-of-vm passthrough read #276165 to bar 1 at offset 561184.
Unexpected out-of-vm passthrough read #276166 to bar 1 at offset 561184.
...

So my question here is,
1. How do I diagnose why the instructions are not being executed in guest m=
ode?

Some other problems:

2. Once the virtual machine is shut down, the passed-through GPU doesn't ge=
t turned off. Whatever message was on the screen in the final throes of Lin=
ux's shutdown stays there. Maybe there is a specific detach command which b=
hyve or nouveau hasn't yet implemented? Alternatively, maybe I could exploi=
t some power management feature to reset the card when bhyve exits.

3. It is not possible to reboot the guest and then start X again without an=
 intervening host reboot. The text console works fine. Xorg.0.log has a mes=
sage like
    (EE) [drm] Failed to open DRM device for pci:0000:00:06.0: -19
    (EE) open /dev/dri/card0: No such file or directory
dmesg is not very helpful either.[0] I suspect that this is related to prob=
lem (2).

4. There is a known bug in the version of the Xorg server that ships with D=
ebian 9, where the switch from an animated mouse cursor back to a static cu=
rsor causes the X server to sit in a busy loop of gradually increasing stac=
k depth, if the GPU takes too long to communicate with the driver.[1] For m=
e, this consistently happens after I type my password into the Debian login=
 dialog box and eventually (~ 120 minutes) locks up the host by eating all =
the swap. A work-around is to replace the guest's animated cursors with sta=
tic cursors. The bug is fixed in newer versions of X, but I haven't tested =
whether their fix works for me yet.

5. The GPU doesn't come to life until the nouveau driver kicks in. What is =
special about the driver? Why doesn't the UEFI open the GPU and send it out=
put before the boot? Any idea if the problem is on the UEFI side or the hyp=
ervisor side?

6. On Windows, the way Windows probes multi-BAR devices seems to be inconsi=
stent with bhyve's model for storing io memory mappings. Specifically, I be=
lieve Windows assigns the 0xffffffff sentinel to all BARs on a device in on=
e shot, then reads them back and assigns the true addresses afterwards. How=
ever, bhyve sees the multiple 0xffffffff assignments to different BARs as a=
 clash and errors out on the second BAR probe. I removed most of the mmio_r=
b_tree error handling in mem.c and this is sufficient for Windows to boot, =
and detect and correctly identify the GPU. (A better solution might be to h=
andle the initial 0xffffffff write as a special case.) I can then install t=
he official nVidia drivers without problem over Remote Desktop. However, th=
e GPU never springs into life: I am stuck with a "Windows has stopped this =
device because it has reported problems. (Code 43)" error in the device man=
ager, a blank screen, and not much else to go on.

Is it worth me continuing to hack away at these problems---of course I'm ha=
ppy to share anything I come up with---or is there an official solution to =
GPU support in the pipe about to make my efforts redundant :)?

Thanks,
Robert Crowston.

---
Footnotes

[0]  Diff'ing dmesg after successful GPU initialization (+) and after failu=
re (-), and cutting out some lines that aren't relevant:
 nouveau 0000:00:06.0: bios: version 80.28.a6.00.10
+nouveau 0000:00:06.0: priv: HUB0: 085014 ffffffff (1f70820b)
 nouveau 0000:00:06.0: fb: 1024 MiB DDR3
@@ -466,24 +467,17 @@
 nouveau 0000:00:06.0: DRM: DCB conn 00: 00001031
 nouveau 0000:00:06.0: DRM: DCB conn 01: 00002161
 nouveau 0000:00:06.0: DRM: DCB conn 02: 00000200
-nouveau 0000:00:06.0: disp: chid 0 mthd 0000 data 00000400 00001000 000000=
02
-nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drivers=
/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:88/gf119_disp_dmac_init()!
-nouveau 0000:00:06.0: disp: ch 1 init: c207009b
-nouveau: DRM:00000000:0000927c: init failed with -16
-nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drivers=
/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:54/gf119_disp_dmac_fini()!
-nouveau 0000:00:06.0: disp: ch 1 fini: c2071088
-nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drivers=
/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:54/gf119_disp_dmac_fini()!
-nouveau 0000:00:06.0: disp: ch 1 fini: c2071088
+[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
+[drm] Driver supports precise vblank timestamp query.
+nouveau 0000:00:06.0: DRM: MM: using COPY for buffer copies
+nouveau 0000:00:06.0: DRM: allocated 1920x1080 fb: 0x60000, bo ffff96fdb39=
a1800
+fbcon: nouveaufb (fb0) is primary device
-nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drivers=
/gpu/drm/nouveau/nvkm/engine/disp/coregf119.c:187/gf119_disp_core_fini()
-nouveau 0000:00:06.0: disp: core fini: 8d0f0088
-[TTM] Finalizing pool allocator
-[TTM] Finalizing DMA pool allocator
-[TTM] Zone  kernel: Used memory at exit: 0 kiB
-[TTM] Zone   dma32: Used memory at exit: 0 kiB
-nouveau: probe of 0000:00:06.0 failed with error -16
+Console: switching to colour frame buffer device 240x67
+nouveau 0000:00:06.0: fb0: nouveaufb frame buffer device
+[drm] Initialized nouveau 1.3.1 20120801 for 0000:00:06.0 on minor 0

[1] https://devtalk.nvidia.com/default/topic/1028172/linux/titan-v-ubuntu-1=
6-04lts-and-387-34-driver-crashes-badly/post/5230898/#5230898