Date: Sat, 14 Mar 2020 13:50:39 +0100 From: Alex Erley <erleya@gmail.com> To: Peter Grehan <grehan@freebsd.org> Cc: freebsd-virtualization@freebsd.org Subject: Re: [GPU pass-through] no compatible bridge window for claimed BAR Message-ID: <b24b894e-3b3c-3d92-4eb0-6426d873703f@gmail.com> In-Reply-To: <07921dcf-11d5-f440-a42f-d7ec950cab10@freebsd.org> References: <CAONCVozTcKP_=8AdOCfFNiRQnQ254%2BFVn0ZDRK-V8Zo%2BFFd_qQ@mail.gmail.com> <07921dcf-11d5-f440-a42f-d7ec950cab10@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, Some new findings to share. 1) Changing PCI_EMUL_MEMBASE64 from 0xD000000000 to any value *below 0x0440000000* makes bhyve fail when starting VM with message: bhyve: failed to initialize BARs for PCI 1/0/0 device emulation initialization error: Cannot allocate memory 2) Having PCI_EMUL_MEMBASE64 set to 0x0440000000 (or above) guest VM can not configure BARs of pass-through device properly. == (a) == On BHyve host ppt device is: > devinfo -rv ... pci0 hostb0 at slot=0 function=0 dbsf=pci0:0:0:0 pcib1 at slot=1 function=0 dbsf=pci0:0:1:0 handle=\_SB_.PCI0.P0P2 I/O ports: 0xe000-0xefff I/O memory addresses: 0x00c0000000-0x00d30fffff <-- covers all child mem windows pci1 ppt0 at slot=0 function=0 dbsf=pci0:1:0:0 pcib1 I/O port window: 0xe000-0xe07f pcib1 memory window: 0x00c0000000-0x00cfffffff <-- 256M 0x00d0000000-0x00d1ffffff <-- 32M 0x00d2000000-0x00d2ffffff <-- 16M ppt1 at slot=0 function=1 dbsf=pci0:1:0:1 pcib1 memory window: 0xd3080000-0xd3083fff <-- 16K ... and there is no other device attached to pci1. == (b) == On guest VM dmesg shows (timestamps are removed): ... BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable BIOS-e820: [mem 0x0000000000100000-0x00000000bea95fff] usable BIOS-e820: [mem 0x00000000bea96000-0x00000000bea97fff] reserved BIOS-e820: [mem 0x00000000bea98000-0x00000000bea99fff] ACPI data BIOS-e820: [mem 0x00000000bea9a000-0x00000000beaa8fff] reserved BIOS-e820: [mem 0x00000000beaa9000-0x00000000bfb28fff] usable BIOS-e820: [mem 0x00000000bfb29000-0x00000000bfb58fff] type 20 BIOS-e820: [mem 0x00000000bfb59000-0x00000000bfb7cfff] reserved BIOS-e820: [mem 0x00000000bfb7d000-0x00000000bfb81fff] usable BIOS-e820: [mem 0x00000000bfb82000-0x00000000bfb88fff] ACPI data BIOS-e820: [mem 0x00000000bfb89000-0x00000000bfb8cfff] ACPI NVS BIOS-e820: [mem 0x00000000bfb8d000-0x00000000bffcffff] usable BIOS-e820: [mem 0x00000000bffd0000-0x00000000bffeffff] reserved BIOS-e820: [mem 0x00000000bfff0000-0x00000000bfffffff] usable BIOS-e820: [mem 0x0000000100000000-0x000000043fffffff] usable ^^^-upper limit for adressable memory ... PM: Registered nosave memory: [mem 0xc0000000-0xffffffff] [mem 0xc0000000-0xffffffff] available for PCI devices ... pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] ^-- 128K pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff window] ^-- 512M pci_bus 0000:00: root bus resource [mem 0xf0000000-0xf07fffff window] ^-- 8M pci_bus 0000:00: root bus resource [bus 00-ff] == (c) == Until now all runs OK. Guest Linux then allocates memory regions for devices. Allocation is done from lower reg (0x10) to higher (0x30) for each device (i.e. from 00.0 to 1f.0) on PCI bus. Here I reordered dmesg output to groups to show continuous RAM regions: (pass-through device is marked with *) pci 0000:00:01.0: reg 0x24: [io 0x2000-0x207f] pci 0000:00:02.0: reg 0x10: [io 0x2080-0x209f] pci 0000:00:03.0: reg 0x10: [io 0x20c0-0x20ff] ... pci 0000:00:00.0: reg 0x30: [mem 0x00000000-0x000007ff pref] pci 0000:00:02.0: reg 0x30: [mem 0x00000000-0x000007ff pref] pci 0000:00:03.0: reg 0x30: [mem 0x00000000-0x000007ff pref] pci 0000:00:1d.0: reg 0x30: [mem 0x00000000-0x000007ff pref] pci 0000:00:1e.0: reg 0x30: [mem 0x00000000-0x000007ff pref] pci 0000:00:1f.0: reg 0x30: [mem 0x00000000-0x000007ff pref] ... pci 0000:00:01.0: reg 0x10:*[mem 0xc0000000-0xc0ffffff] 16M ... 0xc1000000-0xc1ffffff 16M gap pci 0000:00:01.0: reg 0x1c:*[mem 0xc2000000-0xc3ffffff 64bit pref] 32M pci 0000:00:01.1: reg 0x10:*[mem 0xc4000000-0xc4003fff] pci 0000:00:02.0: reg 0x14: [mem 0xc4004000-0xc4005fff] pci 0000:00:03.0: reg 0x14: [mem 0xc4006000-0xc4007fff] pci 0000:00:1d.0: reg 0x10: [mem 0xc4008000-0xc400807f] ... 0xc4008080-0xc4ffffff <16M gap pci 0000:00:1d.0: reg 0x14: [mem 0xc5000000-0xc5ffffff] 16M pci 0000:00:1e.0: reg 0x10: [mem 0xc6000000-0xc6000fff] ... 0xc6001000-0xd2ffffff <208M gap pci 0000:00:01.0: reg 0x30:*[mem 0xd3000000-0xd307ffff pref] 512K 0xd3080000-0xdfffffff <208M gap pci 0000:00:01.0: reg0x14:*[mem 0x440000000-0x44fffffff 64bit pref] 256M ^^^- this value is outside allowed range == (d) == So, there is no window for 256M BAR, although there are 2 big gapes of 208M in 512M space provided for BAR allocation by PCI bus. So, BAR reg 0x14 of size 256M for device 01.0 must be inside provisioned 512M region 0xc0000000-0xdfffffff. But refering to (1) above, setting base address to any value below 0x440000000 breaks bhyve on start. According to (b), this value corresponds to upper addressable memory limit in guest VM. So I'm blocked here at the moment: - Guest VM requires a value which BHyve doesn't like. - Guest VM allocates BARs with huge gapes. I have little knowledge about PCI bus internals, although I already read some articles on internet. Could it be some ACPI trick to do? I'd be happy to hear any ideas... PS I suspect that if I take other OS as a guest VM or other pass-through GPU model, it would probably allocate BARs properly. But this is not what I want for this config. There should be a way to allocate 256M BAR in guest Linux. Have a nice day, Alex
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b24b894e-3b3c-3d92-4eb0-6426d873703f>