From owner-freebsd-virtualization@freebsd.org Thu Mar 19 10:20:03 2020 Return-Path: Delivered-To: freebsd-virtualization@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 392D825AF2E for ; Thu, 19 Mar 2020 10:20:03 +0000 (UTC) (envelope-from crowston@protonmail.com) Received: from mail-40131.protonmail.ch (mail-40131.protonmail.ch [185.70.40.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "protonmail.com", Issuer "SwissSign Server Gold CA 2014 - G22" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48jjZm2CnWz3Pr6 for ; Thu, 19 Mar 2020 10:19:59 +0000 (UTC) (envelope-from crowston@protonmail.com) Date: Thu, 19 Mar 2020 10:19:54 +0000 To: Alex Erley From: Robert Crowston Cc: Peter Grehan , "freebsd-virtualization@freebsd.org" , Henrik Gulbrandsen Reply-To: Robert Crowston Subject: Re: [GPU pass-through] no compatible bridge window for claimed BAR Message-ID: In-Reply-To: <4674c0fc-2696-3476-55e4-608d11ebece2@gmail.com> References: <07921dcf-11d5-f440-a42f-d7ec950cab10@freebsd.org> <4674c0fc-2696-3476-55e4-608d11ebece2@gmail.com> Feedback-ID: 2OVbcR1yHYpdkD8cgQllkFwcuMVZg_LiVMMPvptooFDfHD_03MuQO4ZaF626jWHZYFEhNR2cmIbZ53j4QGWMBQ==:Ext:ProtonMail MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=7.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM shortcircuit=no autolearn=disabled version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on mail.protonmail.ch X-Rspamd-Queue-Id: 48jjZm2CnWz3Pr6 X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.09 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; HAS_REPLYTO(0.00)[crowston@protonmail.com]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[protonmail.com]; R_SPF_ALLOW(-0.20)[+ip4:185.70.40.0/24]; DKIM_TRACE(0.00)[protonmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[protonmail.com,quarantine]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; IP_SCORE(0.00)[ip: (-9.80), ipnet: 185.70.40.0/24(-4.89), asn: 62371(-3.91), country: CH(0.05)]; RCVD_IN_DNSWL_LOW(-0.10)[131.40.70.185.list.dnswl.org : 127.0.5.1]; ASN(0.00)[asn:62371, ipnet:185.70.40.0/24, country:CH]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[protonmail.com]; R_DKIM_ALLOW(-0.20)[protonmail.com:s=default]; REPLYTO_EQ_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_HAM_MEDIUM(-0.99)[-0.991,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_REPLYTO(0.00)[protonmail.com]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[] X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Mar 2020 10:20:03 -0000 > This solved all the problems with BAR allocation. Nice! > Do we have already some code for ROM BAR somewhere? Henrik Gulbrandsen was working on expansion ROM support last July (CC'd). H= e posted his work up on https://www.gulbra.net/freebsd-bhyve/. =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me= ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 On Thursday, 19 March 2020 03:52, Alex Erley wrote: > Hello, > > You are right about mapping 64-bit BARs. > > =3D=3D=3D (1) =3D=3D=3D > The initial value of PCI_EMUL_MEMBASE64 is 0xD000000000 and it > corresponds to the 64-bit mmio window set by QWordMemory(...) > in DSDT from pci_bus_write_dsdt(), so guest VM gets from > UEFI BIOS this 64-bit window: > 0x000000D000000000-0x000000D0000FFFFF > and it is ignored by guest kernel during boot: > ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-ff]) > acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM > Segments MSI] > acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM > acpi PNP0A03:00: host bridge window [mem 0xd000000000-0xd0000fffff > window] (ignored, not CPU addressable) > acpi PNP0A03:00: host bridge window expanded to [io 0x0000-0x0cf7]; > [io 0x0000-0x0cf7 window] ignored > > 64-bit BAR code must be fixed to set proper memory region depending on > host and guest resources. > In general it makes sense only for huge BAR sizes, but in most cases all > the guest devices should fit in available 32-bit mmio window. > I decided to put it aside and concentrate my efforts on 32-bit BARs. > > So, I increased the max size in pci_emul_alloc_pbar() from 32 to 256 Mb > to have all BARs allocated in 32-bit mmio window (for me it still looks > like a hack). > As I already tried it before, BHyve failed on VM start and I turned to > debug what happened. > > Allocation goes in 32-bit mmio window 0xc0000000-0xdfffffff (512Mb). > Keeping in mind these two requirements from PCI standard: > > - size of each region must be a power of two (it is already OK), > > - base address must be aligned on a boundary equal to the region size, > BARs are allocated in the order of their indices: > BAR[0]: type=3D2, size=3D0x1000000 =3D> addr=3D0xc0000000, nextaddr= =3D0xc1000000 > BAR[1]: type=3D3, size=3D0x10000000=3D> addr=3D0xd0000000, nextaddr= =3D0xe0000000 > BAR[3]: type=3D3, size=3D0x2000000 =3D> KO, 0xe0000000+size > 0xdffff= fff > BAR[5]: type=3D1, size=3D0x80 > > I fixed cfginitbar() in pci_passthru.c to allocate BARs in different > order to make allocations more compact. They have to be allocated fro= m > bigger claimed size to smaller (btw, as it is done on host system). > For this, in cfginitbar() I reordered BAR indices before calling > pci_emul_alloc_pbar(). > This solved all the problems with BAR allocation. > (will share my patch when review it a bit more) > > =3D=3D=3D (2) =3D=3D=3D > As you pointed before, now I faced the ROM mapping problem. > > All the ROM BARs are disabled in guest VM and quick look shows that > as PCI_BARMAX=3D=3D5 (=3DPCIR_MAX_BAR_0 from ), > > > so only 6 BARs initialized in cfginitbar() at pci_passthru.c. > It corresponds to registers 10h, 14h, 18h, 1ch, 20h, 24h. > Expansion ROM Base Address register 30h is not initialized at all. > > I'd like to add missing code for it in pci_passthru.c. > Physical guest memory map described in pci_emul.c says all ROM BARs > must be allocated in 8Mb window 0xf0000000-0xf07fffff and it already > matches generated DSDT. > > From PCI documentation dealing with ROM doesn't seem to be very > complicated, but anyway, I'm interested in > > - What pitfalls will be on that way? > - Do we have already some code for ROM BAR somewhere? > > Any help is welcome. > > Have a nice day, > Alex > > On 3/15/20 3:20 PM, Robert Crowston wrote: > > > > I suggest you map the BAR into the 32 bit address space, unless you hav= e so many PCI devices that this is not feasible. Just raise the limit of th= e special 64 bit handling to 1 GB or something big. > > > > - Many/most(?) consumer BIOS/UEFIs map 64 bit bars into the 32 bit ad= dress space by default, so this configuration is much more tested for devic= e drivers and guest operating systems. > > - Passthrough doesn't work for me at all above the 4GB memory window = on my recent AMD system. It exits back to the hypervisor, which then crashe= s with a failed assert because it doesn't expect to handle this. > > > > With this tweak it is possible to use the open source nVidia driver on = Linux. > > However, your next problem---if you want to use proprietary drivers---i= s that access to the ROM BAR is not supported or emulated. If that could be= fixed, it would be a big step forwards. > > =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Origina= l Message =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80= =90 > > On Saturday, 14 March 2020 12:50, Alex Erley erleya@gmail.com wrote: > > > > > Hello, > > > Some new findings to share. > > > > > > 1. Changing PCI_EMUL_MEMBASE64 from 0xD000000000 to any value > > > below 0x0440000000 makes bhyve fail when starting VM with message= : > > > bhyve: failed to initialize BARs for PCI 1/0/0 > > > device emulation initialization error: Cannot allocate memory > > > > > > 2. Having PCI_EMUL_MEMBASE64 set to 0x0440000000 (or above) guest VM > > > can not configure BARs of pass-through device properly. > > > =3D=3D (a) =3D=3D > > > On BHyve host ppt device is: > > > > > > > > > > devinfo -rv > > > > > > ... > > > pci0 > > > hostb0 at slot=3D0 function=3D0 dbsf=3Dpci0:0:0:0 > > > pcib1 at slot=3D1 function=3D0 dbsf=3Dpci0:0:1:0 handle=3D\SB.PCI0.P0= P2 > > > I/O ports: 0xe000-0xefff > > > I/O memory addresses: > > > 0x00c0000000-0x00d30fffff <-- covers all child mem windows > > > pci1 > > > ppt0 at slot=3D0 function=3D0 dbsf=3Dpci0:1:0:0 > > > pcib1 I/O port window: 0xe000-0xe07f > > > pcib1 memory window: > > > 0x00c0000000-0x00cfffffff <-- 256M > > > 0x00d0000000-0x00d1ffffff <-- 32M > > > 0x00d2000000-0x00d2ffffff <-- 16M > > > ppt1 at slot=3D0 function=3D1 dbsf=3Dpci0:1:0:1 > > > pcib1 memory window: > > > 0xd3080000-0xd3083fff <-- 16K > > > ... > > > and there is no other device attached to pci1. > > > =3D=3D (b) =3D=3D > > > On guest VM dmesg shows (timestamps are removed): > > > ... > > > BIOS-provided physical RAM map: > > > BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable > > > BIOS-e820: [mem 0x0000000000100000-0x00000000bea95fff] usable > > > BIOS-e820: [mem 0x00000000bea96000-0x00000000bea97fff] reserved > > > BIOS-e820: [mem 0x00000000bea98000-0x00000000bea99fff] ACPI data > > > BIOS-e820: [mem 0x00000000bea9a000-0x00000000beaa8fff] reserved > > > BIOS-e820: [mem 0x00000000beaa9000-0x00000000bfb28fff] usable > > > BIOS-e820: [mem 0x00000000bfb29000-0x00000000bfb58fff] type 20 > > > BIOS-e820: [mem 0x00000000bfb59000-0x00000000bfb7cfff] reserved > > > BIOS-e820: [mem 0x00000000bfb7d000-0x00000000bfb81fff] usable > > > BIOS-e820: [mem 0x00000000bfb82000-0x00000000bfb88fff] ACPI data > > > BIOS-e820: [mem 0x00000000bfb89000-0x00000000bfb8cfff] ACPI NVS > > > BIOS-e820: [mem 0x00000000bfb8d000-0x00000000bffcffff] usable > > > BIOS-e820: [mem 0x00000000bffd0000-0x00000000bffeffff] reserved > > > BIOS-e820: [mem 0x00000000bfff0000-0x00000000bfffffff] usable > > > BIOS-e820: [mem 0x0000000100000000-0x000000043fffffff] usable > > > ^^^-upper limit for adressable memory > > > ... > > > PM: Registered nosave memory: [mem 0xc0000000-0xffffffff] > > > [mem 0xc0000000-0xffffffff] available for PCI devices > > > ... > > > pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7] > > > pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] > > > pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] > > > ^-- 128K > > > pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff window] > > > ^-- 512M > > > pci_bus 0000:00: root bus resource [mem 0xf0000000-0xf07fffff window] > > > ^-- 8M > > > pci_bus 0000:00: root bus resource [bus 00-ff] > > > =3D=3D (c) =3D=3D > > > Until now all runs OK. > > > Guest Linux then allocates memory regions for devices. > > > Allocation is done from lower reg (0x10) to higher (0x30) > > > for each device (i.e. from 00.0 to 1f.0) on PCI bus. > > > Here I reordered dmesg output to groups to show continuous RAM region= s: > > > (pass-through device is marked with ) > > > pci 0000:00:01.0: reg 0x24: [io 0x2000-0x207f] > > > pci 0000:00:02.0: reg 0x10: [io 0x2080-0x209f] > > > pci 0000:00:03.0: reg 0x10: [io 0x20c0-0x20ff] > > > ... > > > pci 0000:00:00.0: reg 0x30: [mem 0x00000000-0x000007ff pref] > > > pci 0000:00:02.0: reg 0x30: [mem 0x00000000-0x000007ff pref] > > > pci 0000:00:03.0: reg 0x30: [mem 0x00000000-0x000007ff pref] > > > pci 0000:00:1d.0: reg 0x30: [mem 0x00000000-0x000007ff pref] > > > pci 0000:00:1e.0: reg 0x30: [mem 0x00000000-0x000007ff pref] > > > pci 0000:00:1f.0: reg 0x30: [mem 0x00000000-0x000007ff pref] > > > ... > > > pci 0000:00:01.0: reg 0x10:[mem 0xc0000000-0xc0ffffff] 16M... 0xc1000= 000-0xc1ffffff 16M gap > > > pci 0000:00:01.0: reg 0x1c:[mem 0xc2000000-0xc3ffffff 64bit pref] 32M > > > pci 0000:00:01.1: reg 0x10:[mem 0xc4000000-0xc4003fff]pci 0000:00:02.= 0: reg 0x14: [mem 0xc4004000-0xc4005fff] > > > pci 0000:00:03.0: reg 0x14: [mem 0xc4006000-0xc4007fff] > > > pci 0000:00:1d.0: reg 0x10: [mem 0xc4008000-0xc400807f] > > > ... 0xc4008080-0xc4ffffff <16M gap > > > pci 0000:00:1d.0: reg 0x14: [mem 0xc5000000-0xc5ffffff] 16M > > > pci 0000:00:1e.0: reg 0x10: [mem 0xc6000000-0xc6000fff] > > > ... 0xc6001000-0xd2ffffff <208M gap > > > pci 0000:00:01.0: reg 0x30:[mem 0xd3000000-0xd307ffff pref] 512K > > > 0xd3080000-0xdfffffff <208M gap > > > pci 0000:00:01.0: reg0x14:[mem 0x440000000-0x44fffffff 64bit pref] 25= 6M^^^- this value is outside allowed range > > > =3D=3D (d) =3D=3D > > > So, there is no window for 256M BAR, although there are 2 big gapes > > > of 208M in 512M space provided for BAR allocation by PCI bus. > > > So, BAR reg 0x14 of size 256M for device 01.0 must be inside provisio= ned > > > 512M region 0xc0000000-0xdfffffff. > > > But refering to (1) above, setting base address to any value below > > > 0x440000000 breaks bhyve on start. > > > According to (b), this value corresponds to upper addressable memory > > > limit in guest VM. > > > So I'm blocked here at the moment: > > > > > > - Guest VM requires a value which BHyve doesn't like. > > > > > > - Guest VM allocates BARs with huge gapes. > > > I have little knowledge about PCI bus internals, although I alrea= dy read > > > some articles on internet. > > > Could it be some ACPI trick to do? > > > I'd be happy to hear any ideas... > > > PS > > > I suspect that if I take other OS as a guest VM or other pass-thr= ough > > > GPU model, it would probably allocate BARs properly. > > > But this is not what I want for this config. > > > There should be a way to allocate 256M BAR in guest Linux. > > > Have a nice day, > > > Alex > > > > > > > > > freebsd-virtualization@freebsd.org mailing list > > > https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > > > To unsubscribe, send any mail to "freebsd-virtualization-unsubscribe@= freebsd.org"