From owner-freebsd-virtualization@freebsd.org  Tue Jan 10 08:52:32 2017
Return-Path: <owner-freebsd-virtualization@freebsd.org>
Delivered-To: freebsd-virtualization@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 72806CA7A1F
 for <freebsd-virtualization@mailman.ysv.freebsd.org>;
 Tue, 10 Jan 2017 08:52:32 +0000 (UTC)
 (envelope-from soralx@cydem.org)
Received: from smtp.triumf.ca (smtp.triumf.ca [142.90.100.188])
 by mx1.freebsd.org (Postfix) with ESMTP id 607C71FF5
 for <freebsd-virtualization@freebsd.org>; Tue, 10 Jan 2017 08:52:32 +0000 (UTC)
 (envelope-from soralx@cydem.org)
Received: from mscad14 (mscad14.triumf.ca [142.90.115.36])
 (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (No client certificate requested)
 by smtp.triumf.ca (Postfix) with ESMTP id 36466F802;
 Tue, 10 Jan 2017 00:33:33 -0800 (PST)
Date: Tue, 10 Jan 2017 00:33:32 -0800
From: <soralx@cydem.org>
To: <freebsd-virtualization@freebsd.org>, <misc-freebsd@talk2dom.com>
Subject: Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD
 11-RC2)
Message-ID: <20170110003332.7cf8ba15@mscad14>
X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.29; amd64-portbld-freebsd9.3)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-virtualization@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussion of various virtualization techniques FreeBSD supports."
 <freebsd-virtualization.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-virtualization>, 
 <mailto:freebsd-virtualization-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-virtualization/>
List-Post: <mailto:freebsd-virtualization@freebsd.org>
List-Help: <mailto:freebsd-virtualization-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization>, 
 <mailto:freebsd-virtualization-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Jan 2017 08:52:32 -0000


Howdy, virtualization zealots!

 This is in reply to maillist thread [0].

 It so happens that I have to get GPU-accelerated OpenCL working on
 my machine, so I had a play with bhyve & PCI-e passthrough for VGA.
 I was using nVidia Quadro 600 (GF108) for testing (planning to use
 AMD/ATI for OpenCL, of course).

 I tried a Linux guest with the proprietary nVidia driver, and the
 result was that the driver couldn't init the VGA during boot:
  [    1.394726] nvidia: module license 'NVIDIA' taints kernel.
  [    1.395140] Disabling lock debugging due to kernel taint
  [    1.412132] nvidia: module verification failed: signature and/or required key missing - tainting kernel
  [    1.419359] nvidia 0000:00:04.0: can't derive routing for PCI INT A
  [    1.419807] nvidia 0000:00:04.0: PCI INT A: no GSI
  [    1.420157] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
  [    1.420157] NVRM: BAR1 is 0M @ 0x0 (PCI:0000:00:04.0)
  [    1.421023] NVRM: The system BIOS may have misconfigured your GPU.
  [    1.421476] nvidia: probe of 0000:00:04.0 failed with error -1
  [    1.437301] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
  [    1.440094] NVRM: The NVIDIA probe routine failed for 1 device(s).
  [    1.440530] NVRM: None of the NVIDIA graphics adapters were initialized!

 After adding the "pci=nocrs" Linux boot option (which, from what I
 understand, magically helps to [partially] workaround bhyve assigning
 addresses beyond host CPU's physically addressable space for PCIe
 memory-mapped registers), the guest couldn't finish booting, because
 bhyve would segfault.

 Turns out the what peripherals are used, and their order on the
 command line, are important. Edit: actually, looks like it's the
 number of CPUs (the '-c' flag's argument) that makes the difference;
 the machine has a CPU with 4 core, no multithreading.

 This didn't work (segfault):
   `bhyve -A -H -P -s 0:0,hostbridge -s 1:0,lpc -s 2:0,virtio-net,tap0 \
          -s 3:0,virtio-blk,./bhyve_lunix.img \
          -s 4:0,ahci-cd,./ubuntu-16.04.1-server-amd64.iso \
          -s 5:0,passthru,1/0/0 -l com1,stdio -c 4 -m 1024M -S lunixguest`
  [...]
  [  OK  ] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
  [  OK  ] Reached target Swap.
  Assertion failed: (pi->pi_bar[baridx].type == PCIBAR_IO), function passthru_write, file /usr/src/usr.sbin/bhyve/pci_passthru.c, line 850.
            Abort (core dumped)

 But his worked, finally:
   `bhyve -c 1 -m 1024M -S -A -H -P -s 0:0,hostbridge -s 1:0,lpc \
          -s 2:0,virtio-net,tap0 -s 3:0,virtio-blk,./bhyve_lunix.img \
          -s 4:0,passthru,1/0/0 -l com1,stdio lunixguest`

 So, the guest booted, and didn't complain about non-addressable-
 -by-CPU BARs anymore. However, the same fate befall me as Dom
 had in this thread -- the driver loaded:
  [    1.691216] nvidia: module verification failed: signature and/or required key missing - tainting kernel
  [    1.696641] nvidia 0000:00:04.0: can't derive routing for PCI INT A
  [    1.698093] nvidia 0000:00:04.0: PCI INT A: no GSI
  [    1.699277] vgaarb: device changed decodes: PCI:0000:00:04.0,olddecodes=io+mem,decodes=none:owns=io+mem
  [    1.701461] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
  [    1.702649] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.26  Thu Dec  8 18:36:43 PST 2016 (using threaded interrupts)
  [    1.705481] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  375.26  Thu Dec  8 18:04:14 PST 2016
  [    1.708941] [drm] [nvidia-drm] [GPU ID 0x00000004] Loading driver
 but couldn't talk to the card:
  [lost the log, but it was the same as Dom's: "NVRM: rm_init_adapter failed"].

 So I decided to try test in a FreeBSD 10.3-STABLE guest.

 With older driver, or just loading 'nvidia' without modesetting,
 I got guest kernel panics [1]. I loaded 'nvidia-modeset', there
 was more success:
   Linux ELF exec handler installed
   Linux x86-64 ELF exec handler installed
   nvidia0: <Quadro 600> on vgapci0
   vgapci0: child nvidia0 requested pci_enable_io
   vgapci0: attempting to allocate 1 MSI vectors (1 supported)
   msi: routing MSI IRQ 269 to local APIC 2 vector 51
   vgapci0: using IRQ 269 for MSI
   vgapci0: child nvidia0 requested pci_enable_io
   nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  367.44  Wed Aug 17 22:05:09 PDT 2016

 But:
   # nvidia-smi 
   NVRM: Xid (PCI:0000:00:04): 62, !2369(0000)
   NVRM: RmInitAdapter failed! (0x26:0x65:1072)
   nvidia0: NVRM: rm_init_adapter() failed!
   No devices were found
 It also panicked after starting Xorg.

 After stumbling upon some Xen forums, I found the solution: nVidia
 crippled the driver so that it detects virtualization environment,
 and refuses to attach to anything but high-end pro cards! Those
 bastards [if the speculation is true]! GTX960 didn't work. Quadro
 600 didn't work. So I tried with a Quadro 2000:
  root@fbsd12tst:~ # sync
  root@fbsd12tst:~ # kldload nvidia-modeset
  Linux ELF exec handler installed
  nvidia0: <Quadro 2000> on vgapci0
  vgapci0: child nvidia0 requested pci_enable_io
  vgapci0: attempting to allocate 1 MSI vectors (1 supported)
  msi: routing MSI IRQ 269 to local APIC 3 vector 51
  vgapci0: using IRQ 269 for MSI
  vgapci0: child nvidia0 requested pci_enable_io
  random: harvesting attach, 8 bytes (4 bits) from nvidia0

 [a bit more]Success! However:
  root@fbsd12tst:~ # nvidia-smi
  acquiring duplicate lock of same type: "os.lock_sx"
   1st os.lock_sx @ nvidia_os.c:599
   2nd os.lock_sx @ nvidia_os.c:599
  stack backtrace:
  #0 0xffffffff80aa6780 at witness_debugger+0x70
  #1 0xffffffff80aa6683 at witness_checkorder+0xde3
  #2 0xffffffff80a4fac2 at _sx_xlock+0x72
  #3 0xffffffff82a515c2 at os_acquire_mutex+0x32
  #4 0xffffffff82a21068 at _nv016673rm+0x18


  Fatal trap 12: page fault while in kernel mode
  cpuid = 1; apic id = 01
  fault virtual address   = 0xfffffe004f601088
  fault code              = supervisor write data, reserved bits in PTE
  instruction pointer     = 0x20:0xffffffff82a512e3
  stack pointer           = 0x28:0xfffffe0000221138
  frame pointer           = 0x28:0xfffffe0001a76ba8
  code segment            = base 0x0, limit 0xfffff, type 0x1b
                          = DPL 0, pres 1, long 1, def32 0, gran 1
  processor eflags        = interrupt enabled, resume, IOPL = 0
  current process         = 634 (nvidia-smi)
  [ thread pid 634 tid 100100 ]
  Stopped at      os_mem_copy+0xf3:       movb    %dl,(%rax)
  db> bt
  Tracing pid 634 tid 100100 td 0xfffff8000b866500
  os_mem_copy() at os_mem_copy+0xf3/frame 0xfffffe0001a76ba8
  ??() at 0xfffff8000b8beb00
  db> 

 (I upgraded to FreeBSD 12.0-CURRENT (GENERIC) #0 r311659, but
  initially did the test with Quadro 2000 on the same 10.3-STABLE
  as before, with the same results).

 Linux succeeds loading the driver with Quadro 2000, too:
  [    1.374925] nvidia: module license 'NVIDIA' taints kernel.
  [    1.375348] Disabling lock debugging due to kernel taint
  [    1.400506] nvidia: module verification failed: signature and/or required key missing - tainting kernel
  [    1.413539] nvidia 0000:00:04.0: can't derive routing for PCI INT A
  [    1.414003] nvidia 0000:00:04.0: PCI INT A: no GSI
  [    1.414417] vgaarb: device changed decodes: PCI:0000:00:04.0,olddecodes=io+mem,decodes=none:owns=io+mem
  [    1.421807] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
  [    1.422369] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.26  Thu Dec  8 18:36:43 PST 2016 (using threaded interrupts)
  [    1.424568] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  375.26  Thu Dec  8 18:04:14 PST 2016
  [    1.426837] [drm] [nvidia-drm] [GPU ID 0x00000004] Loading driver
 But I get the same assertion and segfault from bhyve if I try
 to run `nvidia-smi` after the OS finished booting [at least it
 seemed to before, but can't get it to finish booting now,
 just hangs].


 And now to the point: how would one go about fixing bhyve's
 tendency to segfault because of assert (it is saying that
 something is still very wrong?), and get Linux working with
 the GPU? And what to do about FreeBSD's guest kernel panics?


P.S.: please CC, as I'm not subscribed.

[0] https://lists.freebsd.org/pipermail/freebsd-virtualization/2016-September/004704.html
[1] Fatal trap 12: page fault while in kernel mode
    cpuid = 3; apic id = 03
    fault virtual address   = 0xfffffe003d7c508c
    fault code              = supervisor write data, reserved bits in PTE
    instruction pointer     = 0x20:0xffffffff820bb5d5
    stack pointer           = 0x28:0xfffffe003d69d380
    frame pointer           = 0x28:0xfffffe000154ce68
    code segment            = base 0x0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags        = interrupt enabled, resume, IOPL = 0
    current process         = 5084 (nvidia-smi)
    trap number             = 12
    panic: page fault

-- 
[SorAlx]  ridin' VN2000 Classic LT