Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Jan 2017 18:42:44 -0800
From:      Peter Grehan <grehan@freebsd.org>
To:        soralx@cydem.org
Cc:        freebsd-virtualization@freebsd.org
Subject:   Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Message-ID:  <7cfaedbd-df48-53a8-2510-5f180ce1f2f6@freebsd.org>
In-Reply-To: <20170111195402.785f27c6@mscad14>
References:  <20170110003332.7cf8ba15@mscad14> <0de7e0fe-5680-b1be-bd57-6bf446c2fd38@talk2dom.com> <0c927784-3e3f-7946-fba9-c25001f4156c@talk2dom.com> <20170110180117.7f246b5a@mscad14> <20170111014544.70670784@mscad14> <93196ea2-5439-49ff-54fd-7b7273bdec85@freebsd.org> <20170111195402.785f27c6@mscad14>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

>  First, `nvidia-smi -q` output diff [0] is interesting. It suggests that
>  the card may be in some incompletely initialized state: notice the
>  "Unknown Error" instead of real UUID, and the P8 power state. Could it
>  be that the driver doesn't put the card's BIOS in the right state?

  That is extremely likely. bhyve itself doesn't have a BIOS, though 
bhyve/UEFI could be modified to handle options ROMs (see 
http://awilliam.github.io/presentations/KVM-Forum-2014/#/)

>  The command was run in both host and guest without Xorg loaded.

  Thanks for the diff; this is very useful.

> -    GPU UUID                        : GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2
> +    GPU UUID                        : Unknown Error

  That implies some type of h/w access isn't working, either MMIO 
registers or response from a DMA command.

> -    Board ID                        : 0x100
> +    Board ID                        : 0x4

  The same ?

>              PCIe Generation
>                  Max                 : 2
> -                Current             : 2
> +                Current             : 1

  bhyve's emulated PCI hostbridge only advertises gen-1 - that could be 
easily changed to gen2. That could make a difference for some of the 
clock issues below

  (source is pci_emul.c:pci_emul_add_pciecap())

>              Link Width
>                  Max                 : 16x
>                  Current             : 16x

  That's a bit unexpected since the hostbridge only advertises 1x, but 
the driver is probably exporting the host value here.

> -    Performance State               : P0
> +    Performance State               : P8

  Note sure what's happening here.

>      Clocks
> -        Graphics                    : 625 MHz
> -        SM                          : 1251 MHz
> -        Memory                      : 1304 MHz
> -        Video                       : 540 MHz
> +        Graphics                    : 405 MHz
> +        SM                          : 810 MHz
> +        Memory                      : 324 MHz
> +        Video                       : 405 MHz

  This may be related to the gen1 vs gen2 issue above.

> When rebooting, I get this:
> nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000857d:0:0:0x00000040

  This may be DMA not working.

  A general issue with PCI passthrough is that often MMIO from the guest 
works, since that is just VT-x remapping, but DMA doesn't work due to 
issues with IOMMU programming (or incorrect mappings being used). This 
gives a device that partially works in that registers can be read, but 
data transfer doesn't work.

> Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed
> Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer

  Not sure what's happening with those.

  Would it be possible to try the nouveau driver ? At least the source 
is available, so it may be easier to determine what is broken.

later,

Peter.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7cfaedbd-df48-53a8-2510-5f180ce1f2f6>