Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Mar 2020 10:19:54 +0000
From:      Robert Crowston <crowston@protonmail.com>
To:        Alex Erley <erleya@gmail.com>
Cc:        Peter Grehan <grehan@freebsd.org>, "freebsd-virtualization@freebsd.org" <freebsd-virtualization@freebsd.org>, Henrik Gulbrandsen <henrik@gulbra.net>
Subject:   Re: [GPU pass-through] no compatible bridge window for claimed BAR
Message-ID:  <CuZPzN6FxkZCh6loC6E911wmfUOOd_rbRfWO2G5uhOsyOGZIsg_M4hrbWckjnSjyBrWR4HBRSj6q_JQd3uXbD_AVtmQ58TU4w5tAW8xlEJE=@protonmail.com>
In-Reply-To: <4674c0fc-2696-3476-55e4-608d11ebece2@gmail.com>
References:  <CAONCVozTcKP_=8AdOCfFNiRQnQ254%2BFVn0ZDRK-V8Zo%2BFFd_qQ@mail.gmail.com> <07921dcf-11d5-f440-a42f-d7ec950cab10@freebsd.org> <b24b894e-3b3c-3d92-4eb0-6426d873703f@gmail.com> <J0SBx0buju5ryP6wIGXLL3UD9R3LLorm0IkMpUs4TfOz3b8IeXZs5M6xoeoBwh-kTBqPdRR0npidVOWnzUZquDvWqJWgdz3RK-r7SBeYdpA=@protonmail.com> <4674c0fc-2696-3476-55e4-608d11ebece2@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> This solved all the problems with BAR allocation.
Nice!

> Do we have already some code for ROM BAR somewhere?
Henrik Gulbrandsen was working on expansion ROM support last July (CC'd). H=
e posted his work up on https://www.gulbra.net/freebsd-bhyve/.


=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me=
ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90
On Thursday, 19 March 2020 03:52, Alex Erley <erleya@gmail.com> wrote:

> Hello,
>
> You are right about mapping 64-bit BARs.
>
> =3D=3D=3D (1) =3D=3D=3D
> The initial value of PCI_EMUL_MEMBASE64 is 0xD000000000 and it
> corresponds to the 64-bit mmio window set by QWordMemory(...)
> in DSDT from pci_bus_write_dsdt(), so guest VM gets from
> UEFI BIOS this 64-bit window:
> 0x000000D000000000-0x000000D0000FFFFF
> and it is ignored by guest kernel during boot:
> ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-ff])
> acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM
> Segments MSI]
> acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
> acpi PNP0A03:00: host bridge window [mem 0xd000000000-0xd0000fffff
> window] (ignored, not CPU addressable)
> acpi PNP0A03:00: host bridge window expanded to [io 0x0000-0x0cf7];
> [io 0x0000-0x0cf7 window] ignored
>
> 64-bit BAR code must be fixed to set proper memory region depending on
> host and guest resources.
> In general it makes sense only for huge BAR sizes, but in most cases all
> the guest devices should fit in available 32-bit mmio window.
> I decided to put it aside and concentrate my efforts on 32-bit BARs.
>
> So, I increased the max size in pci_emul_alloc_pbar() from 32 to 256 Mb
> to have all BARs allocated in 32-bit mmio window (for me it still looks
> like a hack).
> As I already tried it before, BHyve failed on VM start and I turned to
> debug what happened.
>
> Allocation goes in 32-bit mmio window 0xc0000000-0xdfffffff (512Mb).
> Keeping in mind these two requirements from PCI standard:
>
> -   size of each region must be a power of two (it is already OK),
>
> -   base address must be aligned on a boundary equal to the region size,
>     BARs are allocated in the order of their indices:
>     BAR[0]: type=3D2, size=3D0x1000000 =3D> addr=3D0xc0000000, nextaddr=
=3D0xc1000000
>     BAR[1]: type=3D3, size=3D0x10000000=3D> addr=3D0xd0000000, nextaddr=
=3D0xe0000000
>     BAR[3]: type=3D3, size=3D0x2000000 =3D> KO, 0xe0000000+size > 0xdffff=
fff
>     BAR[5]: type=3D1, size=3D0x80
>
>     I fixed cfginitbar() in pci_passthru.c to allocate BARs in different
>     order to make allocations more compact. They have to be allocated fro=
m
>     bigger claimed size to smaller (btw, as it is done on host system).
>     For this, in cfginitbar() I reordered BAR indices before calling
>     pci_emul_alloc_pbar().
>     This solved all the problems with BAR allocation.
>     (will share my patch when review it a bit more)
>
>     =3D=3D=3D (2) =3D=3D=3D
>     As you pointed before, now I faced the ROM mapping problem.
>
>     All the ROM BARs are disabled in guest VM and quick look shows that
>     as PCI_BARMAX=3D=3D5 (=3DPCIR_MAX_BAR_0 from <dev/pci/pcireg.h>),
>
>
> so only 6 BARs initialized in cfginitbar() at pci_passthru.c.
> It corresponds to registers 10h, 14h, 18h, 1ch, 20h, 24h.
> Expansion ROM Base Address register 30h is not initialized at all.
>
> I'd like to add missing code for it in pci_passthru.c.
> Physical guest memory map described in pci_emul.c says all ROM BARs
> must be allocated in 8Mb window 0xf0000000-0xf07fffff and it already
> matches generated DSDT.
>
> From PCI documentation dealing with ROM doesn't seem to be very
> complicated, but anyway, I'm interested in
>
> -   What pitfalls will be on that way?
> -   Do we have already some code for ROM BAR somewhere?
>
>     Any help is welcome.
>
>     Have a nice day,
>     Alex
>
>     On 3/15/20 3:20 PM, Robert Crowston wrote:
>
>
> > I suggest you map the BAR into the 32 bit address space, unless you hav=
e so many PCI devices that this is not feasible. Just raise the limit of th=
e special 64 bit handling to 1 GB or something big.
> >
> > -   Many/most(?) consumer BIOS/UEFIs map 64 bit bars into the 32 bit ad=
dress space by default, so this configuration is much more tested for devic=
e drivers and guest operating systems.
> > -   Passthrough doesn't work for me at all above the 4GB memory window =
on my recent AMD system. It exits back to the hypervisor, which then crashe=
s with a failed assert because it doesn't expect to handle this.
> >
> > With this tweak it is possible to use the open source nVidia driver on =
Linux.
> > However, your next problem---if you want to use proprietary drivers---i=
s that access to the ROM BAR is not supported or emulated. If that could be=
 fixed, it would be a big step forwards.
> > =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Origina=
l Message =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=
=90
> > On Saturday, 14 March 2020 12:50, Alex Erley erleya@gmail.com wrote:
> >
> > > Hello,
> > > Some new findings to share.
> > >
> > > 1.  Changing PCI_EMUL_MEMBASE64 from 0xD000000000 to any value
> > >     below 0x0440000000 makes bhyve fail when starting VM with message=
:
> > >     bhyve: failed to initialize BARs for PCI 1/0/0
> > >     device emulation initialization error: Cannot allocate memory
> > >
> > > 2.  Having PCI_EMUL_MEMBASE64 set to 0x0440000000 (or above) guest VM
> > >     can not configure BARs of pass-through device properly.
> > >     =3D=3D (a) =3D=3D
> > >     On BHyve host ppt device is:
> > >
> > >
> > > > devinfo -rv
> > >
> > > ...
> > > pci0
> > > hostb0 at slot=3D0 function=3D0 dbsf=3Dpci0:0:0:0
> > > pcib1 at slot=3D1 function=3D0 dbsf=3Dpci0:0:1:0 handle=3D\SB.PCI0.P0=
P2
> > > I/O ports: 0xe000-0xefff
> > > I/O memory addresses:
> > > 0x00c0000000-0x00d30fffff <-- covers all child mem windows
> > > pci1
> > > ppt0 at slot=3D0 function=3D0 dbsf=3Dpci0:1:0:0
> > > pcib1 I/O port window: 0xe000-0xe07f
> > > pcib1 memory window:
> > > 0x00c0000000-0x00cfffffff <-- 256M
> > > 0x00d0000000-0x00d1ffffff <-- 32M
> > > 0x00d2000000-0x00d2ffffff <-- 16M
> > > ppt1 at slot=3D0 function=3D1 dbsf=3Dpci0:1:0:1
> > > pcib1 memory window:
> > > 0xd3080000-0xd3083fff <-- 16K
> > > ...
> > > and there is no other device attached to pci1.
> > > =3D=3D (b) =3D=3D
> > > On guest VM dmesg shows (timestamps are removed):
> > > ...
> > > BIOS-provided physical RAM map:
> > > BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
> > > BIOS-e820: [mem 0x0000000000100000-0x00000000bea95fff] usable
> > > BIOS-e820: [mem 0x00000000bea96000-0x00000000bea97fff] reserved
> > > BIOS-e820: [mem 0x00000000bea98000-0x00000000bea99fff] ACPI data
> > > BIOS-e820: [mem 0x00000000bea9a000-0x00000000beaa8fff] reserved
> > > BIOS-e820: [mem 0x00000000beaa9000-0x00000000bfb28fff] usable
> > > BIOS-e820: [mem 0x00000000bfb29000-0x00000000bfb58fff] type 20
> > > BIOS-e820: [mem 0x00000000bfb59000-0x00000000bfb7cfff] reserved
> > > BIOS-e820: [mem 0x00000000bfb7d000-0x00000000bfb81fff] usable
> > > BIOS-e820: [mem 0x00000000bfb82000-0x00000000bfb88fff] ACPI data
> > > BIOS-e820: [mem 0x00000000bfb89000-0x00000000bfb8cfff] ACPI NVS
> > > BIOS-e820: [mem 0x00000000bfb8d000-0x00000000bffcffff] usable
> > > BIOS-e820: [mem 0x00000000bffd0000-0x00000000bffeffff] reserved
> > > BIOS-e820: [mem 0x00000000bfff0000-0x00000000bfffffff] usable
> > > BIOS-e820: [mem 0x0000000100000000-0x000000043fffffff] usable
> > > ^^^-upper limit for adressable memory
> > > ...
> > > PM: Registered nosave memory: [mem 0xc0000000-0xffffffff]
> > > [mem 0xc0000000-0xffffffff] available for PCI devices
> > > ...
> > > pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7]
> > > pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
> > > pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
> > > ^-- 128K
> > > pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff window]
> > > ^-- 512M
> > > pci_bus 0000:00: root bus resource [mem 0xf0000000-0xf07fffff window]
> > > ^-- 8M
> > > pci_bus 0000:00: root bus resource [bus 00-ff]
> > > =3D=3D (c) =3D=3D
> > > Until now all runs OK.
> > > Guest Linux then allocates memory regions for devices.
> > > Allocation is done from lower reg (0x10) to higher (0x30)
> > > for each device (i.e. from 00.0 to 1f.0) on PCI bus.
> > > Here I reordered dmesg output to groups to show continuous RAM region=
s:
> > > (pass-through device is marked with )
> > > pci 0000:00:01.0: reg 0x24: [io 0x2000-0x207f]
> > > pci 0000:00:02.0: reg 0x10: [io 0x2080-0x209f]
> > > pci 0000:00:03.0: reg 0x10: [io 0x20c0-0x20ff]
> > > ...
> > > pci 0000:00:00.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
> > > pci 0000:00:02.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
> > > pci 0000:00:03.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
> > > pci 0000:00:1d.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
> > > pci 0000:00:1e.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
> > > pci 0000:00:1f.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
> > > ...
> > > pci 0000:00:01.0: reg 0x10:[mem 0xc0000000-0xc0ffffff] 16M... 0xc1000=
000-0xc1ffffff 16M gap
> > > pci 0000:00:01.0: reg 0x1c:[mem 0xc2000000-0xc3ffffff 64bit pref] 32M
> > > pci 0000:00:01.1: reg 0x10:[mem 0xc4000000-0xc4003fff]pci 0000:00:02.=
0: reg 0x14: [mem 0xc4004000-0xc4005fff]
> > > pci 0000:00:03.0: reg 0x14: [mem 0xc4006000-0xc4007fff]
> > > pci 0000:00:1d.0: reg 0x10: [mem 0xc4008000-0xc400807f]
> > > ... 0xc4008080-0xc4ffffff <16M gap
> > > pci 0000:00:1d.0: reg 0x14: [mem 0xc5000000-0xc5ffffff] 16M
> > > pci 0000:00:1e.0: reg 0x10: [mem 0xc6000000-0xc6000fff]
> > > ... 0xc6001000-0xd2ffffff <208M gap
> > > pci 0000:00:01.0: reg 0x30:[mem 0xd3000000-0xd307ffff pref] 512K
> > > 0xd3080000-0xdfffffff <208M gap
> > > pci 0000:00:01.0: reg0x14:[mem 0x440000000-0x44fffffff 64bit pref] 25=
6M^^^- this value is outside allowed range
> > > =3D=3D (d) =3D=3D
> > > So, there is no window for 256M BAR, although there are 2 big gapes
> > > of 208M in 512M space provided for BAR allocation by PCI bus.
> > > So, BAR reg 0x14 of size 256M for device 01.0 must be inside provisio=
ned
> > > 512M region 0xc0000000-0xdfffffff.
> > > But refering to (1) above, setting base address to any value below
> > > 0x440000000 breaks bhyve on start.
> > > According to (b), this value corresponds to upper addressable memory
> > > limit in guest VM.
> > > So I'm blocked here at the moment:
> > >
> > > -   Guest VM requires a value which BHyve doesn't like.
> > >
> > > -   Guest VM allocates BARs with huge gapes.
> > >     I have little knowledge about PCI bus internals, although I alrea=
dy read
> > >     some articles on internet.
> > >     Could it be some ACPI trick to do?
> > >     I'd be happy to hear any ideas...
> > >     PS
> > >     I suspect that if I take other OS as a guest VM or other pass-thr=
ough
> > >     GPU model, it would probably allocate BARs properly.
> > >     But this is not what I want for this config.
> > >     There should be a way to allocate 256M BAR in guest Linux.
> > >     Have a nice day,
> > >     Alex
> > >
> > >
> > > freebsd-virtualization@freebsd.org mailing list
> > > https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
> > > To unsubscribe, send any mail to "freebsd-virtualization-unsubscribe@=
freebsd.org"





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CuZPzN6FxkZCh6loC6E911wmfUOOd_rbRfWO2G5uhOsyOGZIsg_M4hrbWckjnSjyBrWR4HBRSj6q_JQd3uXbD_AVtmQ58TU4w5tAW8xlEJE=>