Date: Thu, 19 Mar 2020 04:53:04 +0100 From: Alex Erley <erleya@gmail.com> To: Robert Crowston <crowston@protonmail.com> Cc: Peter Grehan <grehan@freebsd.org>, "freebsd-virtualization@freebsd.org" <freebsd-virtualization@freebsd.org> Subject: Re: [GPU pass-through] no compatible bridge window for claimed BAR Message-ID: <4674c0fc-2696-3476-55e4-608d11ebece2@gmail.com> In-Reply-To: <J0SBx0buju5ryP6wIGXLL3UD9R3LLorm0IkMpUs4TfOz3b8IeXZs5M6xoeoBwh-kTBqPdRR0npidVOWnzUZquDvWqJWgdz3RK-r7SBeYdpA=@protonmail.com> References: <CAONCVozTcKP_=8AdOCfFNiRQnQ254%2BFVn0ZDRK-V8Zo%2BFFd_qQ@mail.gmail.com> <07921dcf-11d5-f440-a42f-d7ec950cab10@freebsd.org> <b24b894e-3b3c-3d92-4eb0-6426d873703f@gmail.com> <J0SBx0buju5ryP6wIGXLL3UD9R3LLorm0IkMpUs4TfOz3b8IeXZs5M6xoeoBwh-kTBqPdRR0npidVOWnzUZquDvWqJWgdz3RK-r7SBeYdpA=@protonmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, You are right about mapping 64-bit BARs. === (1) === The initial value of PCI_EMUL_MEMBASE64 is 0xD000000000 and it corresponds to the 64-bit mmio window set by QWordMemory(...) in DSDT from pci_bus_write_dsdt(), so guest VM gets from UEFI BIOS this 64-bit window: 0x000000D000000000-0x000000D0000FFFFF and it is ignored by guest kernel during boot: ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-ff]) acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM acpi PNP0A03:00: host bridge window [mem 0xd000000000-0xd0000fffff window] (ignored, not CPU addressable) acpi PNP0A03:00: host bridge window expanded to [io 0x0000-0x0cf7]; [io 0x0000-0x0cf7 window] ignored 64-bit BAR code must be fixed to set proper memory region depending on host and guest resources. In general it makes sense only for huge BAR sizes, but in most cases all the guest devices should fit in available 32-bit mmio window. I decided to put it aside and concentrate my efforts on 32-bit BARs. So, I increased the max size in pci_emul_alloc_pbar() from 32 to 256 Mb to have all BARs allocated in 32-bit mmio window (for me it still looks like a hack). As I already tried it before, BHyve failed on VM start and I turned to debug what happened. Allocation goes in 32-bit mmio window 0xc0000000-0xdfffffff (512Mb). Keeping in mind these two requirements from PCI standard: - size of each region must be a power of two (it is already OK), - base address must be aligned on a boundary equal to the region size, BARs are allocated in the order of their indices: BAR[0]: type=2, size=0x1000000 => addr=0xc0000000, nextaddr=0xc1000000 BAR[1]: type=3, size=0x10000000=> addr=0xd0000000, nextaddr=0xe0000000 BAR[3]: type=3, size=0x2000000 => KO, 0xe0000000+size > 0xdfffffff BAR[5]: type=1, size=0x80 I fixed cfginitbar() in pci_passthru.c to allocate BARs in different order to make allocations more compact. They have to be allocated from bigger claimed size to smaller (btw, as it is done on host system). For this, in cfginitbar() I reordered BAR indices before calling pci_emul_alloc_pbar(). This solved all the problems with BAR allocation. (will share my patch when review it a bit more) === (2) === As you pointed before, now I faced the ROM mapping problem. All the ROM BARs are disabled in guest VM and quick look shows that as PCI_BARMAX==5 (=PCIR_MAX_BAR_0 from <dev/pci/pcireg.h>), so only 6 BARs initialized in cfginitbar() at pci_passthru.c. It corresponds to registers 10h, 14h, 18h, 1ch, 20h, 24h. Expansion ROM Base Address register 30h is not initialized at all. I'd like to add missing code for it in pci_passthru.c. Physical guest memory map described in pci_emul.c says all ROM BARs must be allocated in 8Mb window 0xf0000000-0xf07fffff and it already matches generated DSDT. From PCI documentation dealing with ROM doesn't seem to be very complicated, but anyway, I'm interested in - What pitfalls will be on that way? - Do we have already some code for ROM BAR somewhere? Any help is welcome. Have a nice day, Alex On 3/15/20 3:20 PM, Robert Crowston wrote: > I suggest you map the BAR into the 32 bit address space, unless you have so many PCI devices that this is not feasible. Just raise the limit of the special 64 bit handling to 1 GB or something big. > - Many/most(?) consumer BIOS/UEFIs map 64 bit bars into the 32 bit address space by default, so this configuration is much more tested for device drivers and guest operating systems. > - Passthrough doesn't work for me at all above the 4GB memory window on my recent AMD system. It exits back to the hypervisor, which then crashes with a failed assert because it doesn't expect to handle this. > > With this tweak it is possible to use the open source nVidia driver on Linux. > > However, your next problem---if you want to use proprietary drivers---is that access to the ROM BAR is not supported or emulated. If that could be fixed, it would be a big step forwards. > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Saturday, 14 March 2020 12:50, Alex Erley <erleya@gmail.com> wrote: > >> Hello, >> >> Some new findings to share. >> >> 1. Changing PCI_EMUL_MEMBASE64 from 0xD000000000 to any value >> below 0x0440000000 makes bhyve fail when starting VM with message: >> bhyve: failed to initialize BARs for PCI 1/0/0 >> device emulation initialization error: Cannot allocate memory >> >> 2. Having PCI_EMUL_MEMBASE64 set to 0x0440000000 (or above) guest VM >> can not configure BARs of pass-through device properly. >> >> == (a) == >> On BHyve host ppt device is: >> >> >>> devinfo -rv >> >> ... >> pci0 >> hostb0 at slot=0 function=0 dbsf=pci0:0:0:0 >> pcib1 at slot=1 function=0 dbsf=pci0:0:1:0 handle=\SB.PCI0.P0P2 >> I/O ports: 0xe000-0xefff >> I/O memory addresses: >> 0x00c0000000-0x00d30fffff <-- covers all child mem windows >> pci1 >> ppt0 at slot=0 function=0 dbsf=pci0:1:0:0 >> pcib1 I/O port window: 0xe000-0xe07f >> pcib1 memory window: >> 0x00c0000000-0x00cfffffff <-- 256M >> 0x00d0000000-0x00d1ffffff <-- 32M >> 0x00d2000000-0x00d2ffffff <-- 16M >> ppt1 at slot=0 function=1 dbsf=pci0:1:0:1 >> pcib1 memory window: >> 0xd3080000-0xd3083fff <-- 16K >> ... >> and there is no other device attached to pci1. >> >> == (b) == >> On guest VM dmesg shows (timestamps are removed): >> ... >> BIOS-provided physical RAM map: >> BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable >> BIOS-e820: [mem 0x0000000000100000-0x00000000bea95fff] usable >> BIOS-e820: [mem 0x00000000bea96000-0x00000000bea97fff] reserved >> BIOS-e820: [mem 0x00000000bea98000-0x00000000bea99fff] ACPI data >> BIOS-e820: [mem 0x00000000bea9a000-0x00000000beaa8fff] reserved >> BIOS-e820: [mem 0x00000000beaa9000-0x00000000bfb28fff] usable >> BIOS-e820: [mem 0x00000000bfb29000-0x00000000bfb58fff] type 20 >> BIOS-e820: [mem 0x00000000bfb59000-0x00000000bfb7cfff] reserved >> BIOS-e820: [mem 0x00000000bfb7d000-0x00000000bfb81fff] usable >> BIOS-e820: [mem 0x00000000bfb82000-0x00000000bfb88fff] ACPI data >> BIOS-e820: [mem 0x00000000bfb89000-0x00000000bfb8cfff] ACPI NVS >> BIOS-e820: [mem 0x00000000bfb8d000-0x00000000bffcffff] usable >> BIOS-e820: [mem 0x00000000bffd0000-0x00000000bffeffff] reserved >> BIOS-e820: [mem 0x00000000bfff0000-0x00000000bfffffff] usable >> BIOS-e820: [mem 0x0000000100000000-0x000000043fffffff] usable >> ^^^-upper limit for adressable memory >> ... >> PM: Registered nosave memory: [mem 0xc0000000-0xffffffff] >> [mem 0xc0000000-0xffffffff] available for PCI devices >> ... >> pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7] >> pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] >> pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] >> ^-- 128K >> pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff window] >> ^-- 512M >> pci_bus 0000:00: root bus resource [mem 0xf0000000-0xf07fffff window] >> ^-- 8M >> pci_bus 0000:00: root bus resource [bus 00-ff] >> >> == (c) == >> Until now all runs OK. >> >> Guest Linux then allocates memory regions for devices. >> Allocation is done from lower reg (0x10) to higher (0x30) >> for each device (i.e. from 00.0 to 1f.0) on PCI bus. >> >> Here I reordered dmesg output to groups to show continuous RAM regions: >> (pass-through device is marked with ) >> pci 0000:00:01.0: reg 0x24: [io 0x2000-0x207f] >> pci 0000:00:02.0: reg 0x10: [io 0x2080-0x209f] >> pci 0000:00:03.0: reg 0x10: [io 0x20c0-0x20ff] >> ... >> pci 0000:00:00.0: reg 0x30: [mem 0x00000000-0x000007ff pref] >> pci 0000:00:02.0: reg 0x30: [mem 0x00000000-0x000007ff pref] >> pci 0000:00:03.0: reg 0x30: [mem 0x00000000-0x000007ff pref] >> pci 0000:00:1d.0: reg 0x30: [mem 0x00000000-0x000007ff pref] >> pci 0000:00:1e.0: reg 0x30: [mem 0x00000000-0x000007ff pref] >> pci 0000:00:1f.0: reg 0x30: [mem 0x00000000-0x000007ff pref] >> ... >> pci 0000:00:01.0: reg 0x10:[mem 0xc0000000-0xc0ffffff] 16M... 0xc1000000-0xc1ffffff 16M gap >> pci 0000:00:01.0: reg 0x1c:[mem 0xc2000000-0xc3ffffff 64bit pref] 32M >> pci 0000:00:01.1: reg 0x10:[mem 0xc4000000-0xc4003fff]pci 0000:00:02.0: reg 0x14: [mem 0xc4004000-0xc4005fff] >> pci 0000:00:03.0: reg 0x14: [mem 0xc4006000-0xc4007fff] >> pci 0000:00:1d.0: reg 0x10: [mem 0xc4008000-0xc400807f] >> ... 0xc4008080-0xc4ffffff <16M gap >> pci 0000:00:1d.0: reg 0x14: [mem 0xc5000000-0xc5ffffff] 16M >> pci 0000:00:1e.0: reg 0x10: [mem 0xc6000000-0xc6000fff] >> ... 0xc6001000-0xd2ffffff <208M gap >> pci 0000:00:01.0: reg 0x30:[mem 0xd3000000-0xd307ffff pref] 512K >> 0xd3080000-0xdfffffff <208M gap >> pci 0000:00:01.0: reg0x14:[mem 0x440000000-0x44fffffff 64bit pref] 256M^^^- this value is outside allowed range >> >> == (d) == >> So, there is no window for 256M BAR, although there are 2 big gapes >> of 208M in 512M space provided for BAR allocation by PCI bus. >> >> So, BAR reg 0x14 of size 256M for device 01.0 must be inside provisioned >> 512M region 0xc0000000-0xdfffffff. >> But refering to (1) above, setting base address to any value below >> 0x440000000 breaks bhyve on start. >> According to (b), this value corresponds to upper addressable memory >> limit in guest VM. >> >> So I'm blocked here at the moment: >> >> - Guest VM requires a value which BHyve doesn't like. >> - Guest VM allocates BARs with huge gapes. >> >> I have little knowledge about PCI bus internals, although I already read >> some articles on internet. >> Could it be some ACPI trick to do? >> I'd be happy to hear any ideas... >> >> PS >> I suspect that if I take other OS as a guest VM or other pass-through >> GPU model, it would probably allocate BARs properly. >> But this is not what I want for this config. >> There should be a way to allocate 256M BAR in guest Linux. >> >> Have a nice day, >> Alex >> >> >> freebsd-virtualization@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization >> To unsubscribe, send any mail to "freebsd-virtualization-unsubscribe@freebsd.org" > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4674c0fc-2696-3476-55e4-608d11ebece2>