Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Mar 2020 13:50:39 +0100
From:      Alex Erley <erleya@gmail.com>
To:        Peter Grehan <grehan@freebsd.org>
Cc:        freebsd-virtualization@freebsd.org
Subject:   Re: [GPU pass-through] no compatible bridge window for claimed BAR
Message-ID:  <b24b894e-3b3c-3d92-4eb0-6426d873703f@gmail.com>
In-Reply-To: <07921dcf-11d5-f440-a42f-d7ec950cab10@freebsd.org>
References:  <CAONCVozTcKP_=8AdOCfFNiRQnQ254%2BFVn0ZDRK-V8Zo%2BFFd_qQ@mail.gmail.com> <07921dcf-11d5-f440-a42f-d7ec950cab10@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello,

Some new findings to share.

1) Changing PCI_EMUL_MEMBASE64 from 0xD000000000 to any value
*below 0x0440000000* makes bhyve fail when starting VM with message:
   bhyve: failed to initialize BARs for PCI 1/0/0
   device emulation initialization error: Cannot allocate memory

2) Having PCI_EMUL_MEMBASE64 set to 0x0440000000 (or above) guest VM
can not configure BARs of pass-through device properly.

== (a) ==
On BHyve host ppt device is:

 > devinfo -rv
...
pci0
  hostb0 at slot=0 function=0 dbsf=pci0:0:0:0
  pcib1 at slot=1 function=0 dbsf=pci0:0:1:0 handle=\_SB_.PCI0.P0P2
   I/O ports: 0xe000-0xefff
   I/O memory addresses:
	0x00c0000000-0x00d30fffff	<-- covers all child mem windows
   pci1
    ppt0 at slot=0 function=0 dbsf=pci0:1:0:0
     pcib1 I/O port window: 0xe000-0xe07f
     pcib1 memory window:
	0x00c0000000-0x00cfffffff	<-- 256M
	0x00d0000000-0x00d1ffffff	<-- 32M
	0x00d2000000-0x00d2ffffff	<-- 16M
    ppt1 at slot=0 function=1 dbsf=pci0:1:0:1
     pcib1 memory window:
	0xd3080000-0xd3083fff		<-- 16K
...
and there is no other device attached to pci1.

== (b) ==
On guest VM dmesg shows (timestamps are removed):
...
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
BIOS-e820: [mem 0x0000000000100000-0x00000000bea95fff] usable
BIOS-e820: [mem 0x00000000bea96000-0x00000000bea97fff] reserved
BIOS-e820: [mem 0x00000000bea98000-0x00000000bea99fff] ACPI data
BIOS-e820: [mem 0x00000000bea9a000-0x00000000beaa8fff] reserved
BIOS-e820: [mem 0x00000000beaa9000-0x00000000bfb28fff] usable
BIOS-e820: [mem 0x00000000bfb29000-0x00000000bfb58fff] type 20
BIOS-e820: [mem 0x00000000bfb59000-0x00000000bfb7cfff] reserved
BIOS-e820: [mem 0x00000000bfb7d000-0x00000000bfb81fff] usable
BIOS-e820: [mem 0x00000000bfb82000-0x00000000bfb88fff] ACPI data
BIOS-e820: [mem 0x00000000bfb89000-0x00000000bfb8cfff] ACPI NVS
BIOS-e820: [mem 0x00000000bfb8d000-0x00000000bffcffff] usable
BIOS-e820: [mem 0x00000000bffd0000-0x00000000bffeffff] reserved
BIOS-e820: [mem 0x00000000bfff0000-0x00000000bfffffff] usable
BIOS-e820: [mem 0x0000000100000000-0x000000043fffffff] usable
                                    ^^^-upper limit for adressable memory
...
PM: Registered nosave memory: [mem 0xc0000000-0xffffffff]
[mem 0xc0000000-0xffffffff] available for PCI devices
...
pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
                                          ^-- 128K
pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff window]
                                          ^-- 512M
pci_bus 0000:00: root bus resource [mem 0xf0000000-0xf07fffff window]
                                          ^-- 8M
pci_bus 0000:00: root bus resource [bus 00-ff]

== (c) ==
Until now all runs OK.

Guest Linux then allocates memory regions for devices.
Allocation is done from lower reg (0x10) to higher (0x30)
for each device (i.e. from 00.0 to 1f.0) on PCI bus.

Here I reordered dmesg output to groups to show continuous RAM regions:
(pass-through device is marked with *)

pci 0000:00:01.0: reg 0x24: [io  0x2000-0x207f]
pci 0000:00:02.0: reg 0x10: [io  0x2080-0x209f]
pci 0000:00:03.0: reg 0x10: [io  0x20c0-0x20ff]
...
pci 0000:00:00.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
pci 0000:00:02.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
pci 0000:00:03.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
pci 0000:00:1d.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
pci 0000:00:1e.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
pci 0000:00:1f.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
...
pci 0000:00:01.0: reg 0x10:*[mem 0xc0000000-0xc0ffffff]         16M
...                              0xc1000000-0xc1ffffff          16M gap
pci 0000:00:01.0: reg 0x1c:*[mem 0xc2000000-0xc3ffffff 64bit pref]  32M
pci 0000:00:01.1: reg 0x10:*[mem 0xc4000000-0xc4003fff]
pci 0000:00:02.0: reg 0x14: [mem 0xc4004000-0xc4005fff]
pci 0000:00:03.0: reg 0x14: [mem 0xc4006000-0xc4007fff]
pci 0000:00:1d.0: reg 0x10: [mem 0xc4008000-0xc400807f]
...                              0xc4008080-0xc4ffffff          <16M gap
pci 0000:00:1d.0: reg 0x14: [mem 0xc5000000-0xc5ffffff]         16M
pci 0000:00:1e.0: reg 0x10: [mem 0xc6000000-0xc6000fff]
...                              0xc6001000-0xd2ffffff         <208M gap
pci 0000:00:01.0: reg 0x30:*[mem 0xd3000000-0xd307ffff pref]    512K
                                  0xd3080000-0xdfffffff         <208M gap

pci 0000:00:01.0: reg0x14:*[mem 0x440000000-0x44fffffff 64bit pref] 256M
                                 ^^^- this value is outside allowed range

== (d) ==
So, there is no window for 256M BAR, although there are 2 big gapes
of 208M in 512M space provided for BAR allocation by PCI bus.

So, BAR reg 0x14 of size 256M for device 01.0 must be inside provisioned 
512M region 0xc0000000-0xdfffffff.
But refering to (1) above, setting base address to any value below 
0x440000000 breaks bhyve on start.
According to (b), this value corresponds to upper addressable memory 
limit in guest VM.

So I'm blocked here at the moment:
- Guest VM requires a value which BHyve doesn't like.
- Guest VM allocates BARs with huge gapes.

I have little knowledge about PCI bus internals, although I already read 
some articles on internet.
Could it be some ACPI trick to do?
I'd be happy to hear any ideas...

PS
I suspect that if I take other OS as a guest VM or other pass-through 
GPU model, it would probably allocate BARs properly.
But this is not what I want for this config.
There should be a way to allocate 256M BAR in guest Linux.

Have a nice day,
Alex



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b24b894e-3b3c-3d92-4eb0-6426d873703f>