Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jan 2017 01:45:44 -0800
From:      <soralx@cydem.org>
To:        <freebsd-virtualization@freebsd.org>
Subject:   Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Message-ID:  <20170111014544.70670784@mscad14>
In-Reply-To: <20170110180117.7f246b5a@mscad14>
References:  <20170110003332.7cf8ba15@mscad14> <0de7e0fe-5680-b1be-bd57-6bf446c2fd38@talk2dom.com> <0c927784-3e3f-7946-fba9-c25001f4156c@talk2dom.com> <20170110180117.7f246b5a@mscad14>

next in thread | previous in thread | raw e-mail | index | archive | help

> The problem appears to be in the area of assigning memory-mapped
> I/O ranges by bhyve for the VGA card to a region outside of the
> CPU's addressable space; i.e., bhyve does not check CPUID's
> 0x80000008 AL value (0x27 for my CPU, which is 39 bits -- while
> bhyve assigns 0xd000000000 & above for the large Prefetch Memory
> chunks, which requires 40 address bits). At least this is my
> understanding of why VGA passthrough does not work.

To test this, I tried writing to PCI BARs in FreeBSD guest using
`pciconf -w`. Not much use that was: I could read back the values
written to the registers (e.g., `pciconf -r pci0:0:4:0 0x14:48`),
but `pciconf -lvb` still showed the same huge base addresses --
they did not want to change.

OK, I had enough of that. So I went to dig in the source, and
changed the "#define PCI_EMUL_MEMBASE64" from '0xD000000000UL'
to '0x3400000000UL' in src/usr.sbin/bhyve/pci_emul.c. Recompiled
bhyve, booted up FreeBSD, and:
  # pciconf -lvb
  [...]
  vgapci0@pci0:0:4:0:     class=0x030000 card=0x084a10de chip=0x0dd810de rev=0xa1 hdr=0x00
      vendor     = 'NVIDIA Corporation'
      device     = 'GF106GL [Quadro 2000]'
      class      = display
      subclass   = VGA
      bar   [10] = type Memory, range 32, base 0xc2000000, size 33554432, enabled
      bar   [14] = type Prefetchable Memory, range 64, base 0x3400000000, size 134217728, enabled
      bar   [1c] = type Prefetchable Memory, range 64, base 0x3408000000, size 67108864, enabled
      bar   [24] = type I/O Port, range 32, base 0x2080, size 128, enabled

...a-a-and:
  # kldload nvidia-modeset
  Linux ELF exec handler installed
  nvidia0: <Quadro 2000> on vgapci0
  vgapci0: child nvidia0 requested pci_enable_io
  vgapci0: attempting to allocate 1 MSI vectors (1 supported)
  msi: routing MSI IRQ 269 to local APIC 3 vector 51
  vgapci0: using IRQ 269 for MSI
  vgapci0: child nvidia0 requested pci_enable_io
  random: harvesting attach, 8 bytes (4 bits) from nvidia0
  # nvidia-smi
  acquiring duplicate lock of same type: "os.lock_sx"
   1st os.lock_sx @ nvidia_os.c:599
   2nd os.lock_sx @ nvidia_os.c:599
  stack backtrace:
  #0 0xffffffff80aa6780 at witness_debugger+0x70
  #1 0xffffffff80aa6683 at witness_checkorder+0xde3
  #2 0xffffffff80a4fac2 at _sx_xlock+0x72
  #3 0xffffffff82a515c2 at os_acquire_mutex+0x32
  #4 0xffffffff82a21068 at _nv016673rm+0x18
  Tue Jan 10 17:06:48 2017       
  +-----------------------------------------------------------------------------+
  | NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
  |-------------------------------+----------------------+----------------------+
  | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
  | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
  |===============================+======================+======================|
  |   0  Quadro 2000         Off  | 0000:00:04.0     Off |                  N/A |
  | 30%   35C    P8    N/A /  N/A |      0MiB /   963MiB |      0%      Default |
  +-------------------------------+----------------------+----------------------+
                                                                               
  +-----------------------------------------------------------------------------+
  | Processes:                                                       GPU Memory |
  |  GPU       PID  Type  Process name                               Usage      |
  |=============================================================================|
  |  No running processes found                                                 |
  +-----------------------------------------------------------------------------+

Beauty! It's very slow to execute, though. And Xorg is not in a hurry
to start working:
  [   204.724] (--) PCI:*(0:0:4:0) 10de:0dd8:10de:084a rev 161, Mem @ 0xc2000000/33554432, 0x3400000000/134217728, 0x3408000000/67108864, I/O @ 0x00002080/128, BIOS @ 0x????????/65536
  [...]
  [   204.736] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
  [   204.736] (==) NVIDIA(0): RGB weight 888
  [   204.736] (==) NVIDIA(0): Default visual is TrueColor
  [   204.736] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
  [   204.738] (**) NVIDIA(0): Enabling 2D acceleration
  [   213.674] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:0:4:0
  [   213.674] (--) NVIDIA(0):     CRT-0
  [   213.674] (--) NVIDIA(0):     DFP-0 (boot)
  [   213.674] (--) NVIDIA(0):     DFP-1
  [   213.674] (--) NVIDIA(0):     DFP-2
  [   213.674] (--) NVIDIA(0):     DFP-3
  [   213.675] (--) NVIDIA(0):     DFP-4
  [   213.698] (--) NVIDIA(0): CRT-0: disconnected
  [   213.698] (--) NVIDIA(0): CRT-0: 400.0 MHz maximum pixel clock
  [   213.698] (--) NVIDIA(0): 
  [   213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): connected
  [   213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): Internal TMDS
  [   213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): 330.0 MHz maximum pixel clock
  [...]
  [   213.747] (II) NVIDIA(0): NVIDIA GPU Quadro 2000 (GF106GL) at PCI:0:4:0 (GPU-0)
  [   213.747] (--) NVIDIA(0): Memory: 1048576 kBytes
  [   213.747] (--) NVIDIA(0): VideoBIOS: 70.06.0d.00.02
  [   213.747] (II) NVIDIA(0): Detected PCI Express Link width: 16X
  [   213.748] (**) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID for display
  [   213.748] (**) NVIDIA(0):     device DELL 2007FP (DFP-0) (Using EDID frequencies has
  [   213.748] (**) NVIDIA(0):     been enabled on all display devices.)
  [...]
  [   213.751] (II) NVIDIA(0): Virtual screen size determined to be 1600 x 1200
  [   213.761] (--) NVIDIA(0): DPI set to (99, 98); computed from "UseEdidDpi" X config
  [   213.761] (--) NVIDIA(0):     option
  [   213.761] (--) Depth 24 pixmap format is 32 bpp
  [   213.767] (II) NVIDIA: Reserving 12288.00 MB of virtual memory for indirect memory
  [   213.767] (II) NVIDIA:     access.
  [   216.789] (EE) NVIDIA(GPU-0): Failed to initialize DMA.
  [   216.789] (EE)  *** Aborting ***
  [   216.791] (EE) NVIDIA(0): Failed to allocate push buffer
  [   216.839] (EE) 
  Fatal server error:
  [   216.839] (EE) AddScreen/ScreenInit failed for driver 0

Linux still doesn't work (curse Ubuntu! what a mess. It tried to start
Xorg at boot, so I managed to disable that, but no matter what, I
couldn't stop it from trying to run 'nvidia-smi' at boot! And trust me,
I tried a lot. I removed all the scripts related to nvidia, /etc/udev/
is basically empty [/etc just looks like a pile-up of crap, wow!], yet
/usr/bin/nvidia-smi still tried to run by itself until I moved it away).

dmesg:
  [    1.390957] nvidia: module verification failed: signature and/or required key missing - tainting kernel
  [    1.394715] nvidia 0000:00:04.0: can't derive routing for PCI INT A
  [    1.395185] nvidia 0000:00:04.0: PCI INT A: no GSI
  [    1.414173] vgaarb: device changed decodes: PCI:0000:00:04.0,olddecodes=io+mem,decodes=none:owns=io+mem
  [    1.417062] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
  [    1.417609] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.26  Thu Dec  8 18:36:43 PST 2016 (using threaded interrupts)
  [    1.419820] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  375.26  Thu Dec  8 18:04:14 PST 2016
  [    1.422067] [drm] [nvidia-drm] [GPU ID 0x00000004] Loading driver
  [...]
  [    3.904893] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 246
# lspci -vvn
  00:04.0 0300: 10de:0dd8 (rev a1) (prog-if 00 [VGA controller])
          Subsystem: 10de:084a
          Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
          Latency: 0
          Interrupt: pin A routed to IRQ 16
          Region 0: Memory at c2000000 (32-bit, non-prefetchable) [size=32M]
          Region 1: Memory at 3400000000 (64-bit, prefetchable) [size=128M]
          Region 3: Memory at 3408000000 (64-bit, prefetchable) [size=64M]
          Region 5: I/O ports at 2080 [size=128]
          [virtual] Expansion ROM at c0080000 [disabled] [size=512K]
  [...]
But:
  # ./nvidia-smi 
  No devices were found
dmesg:
  [  173.498953] NVRM: RmInitAdapter failed! (0x53:0x3:1856)
  [  173.499115] NVRM: rm_init_adapter failed for device bearing minor number 0

Not sure what's happening. But I'll try with AMD/ATI card.

-- 
[SorAlx]  ridin' VN2000 Classic LT



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170111014544.70670784>