Date: Wed, 14 Feb 2024 09:42:36 -0700 From: Warner Losh <imp@bsdimp.com> To: John Baldwin <jhb@freebsd.org> Cc: Mark Millard <marklmi@yahoo.com>, FreeBSD ARM List <freebsd-arm@freebsd.org>, Current FreeBSD <freebsd-current@freebsd.org> Subject: Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic Message-ID: <CANCZdfrX%2BJpiKgU4Bo%2B-dwX_YVszYXT9hy_S%2BxS91Z8RmiWONQ@mail.gmail.com> In-Reply-To: <d7b20565-6aa1-486f-a197-11fbc4d0c8dd@FreeBSD.org> References: <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5.ref@yahoo.com> <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5@yahoo.com> <f4326cda-c602-439f-ab03-f66b3bca5ff2@FreeBSD.org> <1F704317-FDB8-4BDA-8A67-61CF48794DFE@yahoo.com> <9AFDF067-96E4-4E67-90D2-F40DAF3F138F@yahoo.com> <4C279710-5F88-4295-B1A4-7C395F3587E5@yahoo.com> <3A145420-399D-4EBD-9FF4-18924908AB1D@yahoo.com> <1298DF9C-0F82-4567-8E81-7332A608C7FC@yahoo.com> <d7b20565-6aa1-486f-a197-11fbc4d0c8dd@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000006a28f106115a33e8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Feb 14, 2024 at 9:08=E2=80=AFAM John Baldwin <jhb@freebsd.org> wrot= e: > On 2/12/24 5:57 PM, Mark Millard wrote: > > On Feb 12, 2024, at 16:36, Mark Millard <marklmi@yahoo.com> wrote: > > > >> On Feb 12, 2024, at 16:10, Mark Millard <marklmi@yahoo.com> wrote: > >> > >>> On Feb 12, 2024, at 12:00, Mark Millard <marklmi@yahoo.com> wrote: > >>> > >>>> [Gack: I was looking at the wrong vintage of source code, predating > >>>> your changes: wrong system used.] > >>>> > >>>> > >>>> On Feb 12, 2024, at 10:41, Mark Millard <marklmi@yahoo.com> wrote: > >>>> > >>>>> On Feb 12, 2024, at 09:32, John Baldwin <jhb@freebsd.org> wrote: > >>>>> > >>>>>> On 2/9/24 8:13 PM, Mark Millard wrote: > >>>>>>> Summary: > >>>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem > 0x7d500000-0x7d50930f irq 80,81 on simplebus2 > >>>>>>> pcib0: parsing FDT for ECAM0: > >>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: > 0x40000000 > >>>>>>> . . . > >>>>>>> rman_manage_region: <pcib1 memory window> request: start > 0x600000000, end 0x6000fffff > >>>>>>> panic: Failed to add resource to rman > >>>>>> > >>>>>> Hmmm, I suspect this is due to the way that bus_translate_resource > works which is > >>>>>> fundamentally broken. It rewrites the start address of a resource > in-situ instead > >>>>>> of keeping downstream resources separate from the upstream > resources. For example, > >>>>>> I don't see how you could ever release a resource in this design > without completely > >>>>>> screwing up your rman. That is, I expect trying to detach a PCI > device behind a > >>>>>> translating bridge that uses the current approach should corrupt > the allocated > >>>>>> resource ranges in an rman long before my changes. > >>>>>> > >>>>>> That said, that doesn't really explain the panic. Hmm, the panic > might be because > >>>>>> for PCI bridge windows the driver now passes RF_ACTIVE and the > bus_translate_resource > >>>>>> hack only kicks in the activate_resource method of > pci_host_generic.c. > >>>>>> > >>>>>>> Detail: > >>>>>>> . . . > >>>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem > 0x7d500000-0x7d50930f irq 80,81 on simplebus2 > >>>>>>> pcib0: parsing FDT for ECAM0: > >>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: > 0x40000000 > >>>>>> > >>>>>> This indicates this is a translating bus. > >>>>>> > >>>>>>> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0 > >>>>>>> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0= x1 > >>>>>>> pcib0: rman_reserve_resource: start=3D0xc0000000, end=3D0xc00ffff= f, > count=3D0x100000 > >>>>>>> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000, > 0xc00fffff], length 0x100000, flags 102, device pcib1 > >>>>>>> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xffff= f> > >>>>>>> considering [0xc0000000, 0xffffffff] > >>>>>>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 > (requested 0x100000) > >>>>>>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000 > >>>>>>> allocating from the beginning > >>>>>>> rman_manage_region: <pcib1 memory window> request: start > 0x600000000, end 0x6000fffff > >>> > >>> What you later typed does not match: > >>> > >>> 0x600000000 > >>> 0x6000fffff > >>> > >>> You later typed: > >>> > >>> 0x60000000 > >>> 0x600fffffff > >>> > >>> This seems to have lead to some confusion from using the > >>> wrong figure(s). > >>> > >>>>>> The fact that we are trying to reserve the CPU addresses in the > rman is because > >>>>>> bus_translate_resource rewrote the start address in the resource > after it was allocated. > >>>>>> > >>>>>> That said, I can't see why rman_manage_region would actually fail. > At this point the > >>>>>> rman is empty (this is the first call to rman_manage_region for > "pcib1 memory window"), > >>>>>> so only the check that should be failing are the checks against > rm_start and > >>>>>> rm_end. For the memory window, rm_start is always 0, and rm_end i= s > always > >>>>>> 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new > (0x60000000 - 0x600fffffff) > >>>>>> ranges are within those bounds. > >>> > >>> No: > >>> > >>> 0xffffffff > >>> > >>> .vs (actual): > >>> > >>> 0x600000000 > >>> 0x6000fffff > > Ok, then this explains the failure if the "raw" addresses are above 4G. = I > have > access to an emag I'm currently using to test fixes to pci_host_generic.c > to > avoid corrupting struct resource objects. I'll post the diff once I've g= ot > something verified to work. > > > It looks to me like in sys/dev/pci/pci_pci.c the: > > > > static void > > pcib_probe_windows(struct pcib_softc *sc) > > { > > . . . > > pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0xffffffff)= ; > > . . . > > > > is just inappropriately restrictive about where in the system > > address space a PCIe can validly be mapped to on the high end. > > That, in turn, leads to the rejection on the RPi4B now that > > the range use is checked. > > No, the physical register in PCI-PCI bridges is only 32-bits. Only the > prefetchable BAR supports 64-bit addresses. This is why the host bridge > is doing a translation from the CPU side (0x600000000) to the PCI BAR > addresses (0xc0000000) so that the BAR addresses are down in the 32-bit > address range. It's also true that many PCI devices only support 32-bit > addresses in memory BARs. 64-bit BARs are an optional extension not > universally supported. > > The translation here is somewhat akin to a type of MMU where the CPU > addresses are mapped to PCI addresses. The problem here is that the > PCI BAR resources need to "stay" as PCI addresses since we depend on > being able to use rman_get_start/end to get the PCI addresses of > allocated resources, but pci_host_generic.c currently rewrites the > addresses. > > Probably I should remove rman_set_start/end entirely (Warner added them > back in 2004) as the methods don't do anything to deal with the fallout > that the rman.rm_list linked-list is no longer sorted by address once > some addresses get rewritten, etc. > At the time, they made sense. Removing it, though may take some doing since we use it in about 284 places in sys/dev today... Somewhat more pervasive than I'd have thought they'd be... Warner --0000000000006a28f106115a33e8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">= <div dir=3D"ltr" class=3D"gmail_attr">On Wed, Feb 14, 2024 at 9:08=E2=80=AF= AM John Baldwin <<a href=3D"mailto:jhb@freebsd.org">jhb@freebsd.org</a>&= gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0= px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2/= 12/24 5:57 PM, Mark Millard wrote:<br> > On Feb 12, 2024, at 16:36, Mark Millard <<a href=3D"mailto:marklmi@= yahoo.com" target=3D"_blank">marklmi@yahoo.com</a>> wrote:<br> > <br> >> On Feb 12, 2024, at 16:10, Mark Millard <<a href=3D"mailto:mark= lmi@yahoo.com" target=3D"_blank">marklmi@yahoo.com</a>> wrote:<br> >><br> >>> On Feb 12, 2024, at 12:00, Mark Millard <<a href=3D"mailto:= marklmi@yahoo.com" target=3D"_blank">marklmi@yahoo.com</a>> wrote:<br> >>><br> >>>> [Gack: I was looking at the wrong vintage of source code, = predating<br> >>>> your changes: wrong system used.]<br> >>>><br> >>>><br> >>>> On Feb 12, 2024, at 10:41, Mark Millard <<a href=3D"mai= lto:marklmi@yahoo.com" target=3D"_blank">marklmi@yahoo.com</a>> wrote:<b= r> >>>><br> >>>>> On Feb 12, 2024, at 09:32, John Baldwin <<a href=3D= "mailto:jhb@freebsd.org" target=3D"_blank">jhb@freebsd.org</a>> wrote:<b= r> >>>>><br> >>>>>> On 2/9/24 8:13 PM, Mark Millard wrote:<br> >>>>>>> Summary:<br> >>>>>>> pcib0: <BCM2838-compatible PCI-express cont= roller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2<br> >>>>>>> pcib0: parsing FDT for ECAM0:<br> >>>>>>> pcib0:=C2=A0 PCI addr: 0xc0000000, CPU addr: 0= x600000000, Size: 0x40000000<br> >>>>>>> . . .<br> >>>>>>> rman_manage_region: <pcib1 memory window>= ; request: start 0x600000000, end 0x6000fffff<br> >>>>>>> panic: Failed to add resource to rman<br> >>>>>><br> >>>>>> Hmmm, I suspect this is due to the way that bus_tr= anslate_resource works which is<br> >>>>>> fundamentally broken.=C2=A0 It rewrites the start = address of a resource in-situ instead<br> >>>>>> of keeping downstream resources separate from the = upstream resources.=C2=A0 =C2=A0For example,<br> >>>>>> I don't see how you could ever release a resou= rce in this design without completely<br> >>>>>> screwing up your rman.=C2=A0 That is, I expect try= ing to detach a PCI device behind a<br> >>>>>> translating bridge that uses the current approach = should corrupt the allocated<br> >>>>>> resource ranges in an rman long before my changes.= <br> >>>>>><br> >>>>>> That said, that doesn't really explain the pan= ic.=C2=A0 Hmm, the panic might be because<br> >>>>>> for PCI bridge windows the driver now passes RF_AC= TIVE and the bus_translate_resource<br> >>>>>> hack only kicks in the activate_resource method of= pci_host_generic.c.<br> >>>>>><br> >>>>>>> Detail:<br> >>>>>>> . . .<br> >>>>>>> pcib0: <BCM2838-compatible PCI-express cont= roller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2<br> >>>>>>> pcib0: parsing FDT for ECAM0:<br> >>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x60000= 0000, Size: 0x40000000<br> >>>>>><br> >>>>>> This indicates this is a translating bus.<br> >>>>>><br> >>>>>>> pcib1: <PCI-PCI bridge> irq 91 at device= 0.0 on pci0<br> >>>>>>> rman_manage_region: <pcib1 bus numbers> = request: start 0x1, end 0x1<br> >>>>>>> pcib0: rman_reserve_resource: start=3D0xc00000= 00, end=3D0xc00fffff, count=3D0x100000<br> >>>>>>> rman_reserve_resource_bound: <PCIe Memory&g= t; request: [0xc0000000, 0xc00fffff], length 0x100000, flags 102, device pc= ib1<br> >>>>>>> rman_reserve_resource_bound: trying 0xffffffff= <0xc0000000,0xfffff><br> >>>>>>> considering [0xc0000000, 0xffffffff]<br> >>>>>>> truncated region: [0xc0000000, 0xc00fffff]; si= ze 0x100000 (requested 0x100000)<br> >>>>>>> candidate region: [0xc0000000, 0xc00fffff], si= ze 0x100000<br> >>>>>>> allocating from the beginning<br> >>>>>>> rman_manage_region: <pcib1 memory window>= ; request: start 0x600000000, end 0x6000fffff<br> >>><br> >>> What you later typed does not match:<br> >>><br> >>> 0x600000000<br> >>> 0x6000fffff<br> >>><br> >>> You later typed:<br> >>><br> >>> 0x60000000<br> >>> 0x600fffffff<br> >>><br> >>> This seems to have lead to some confusion from using the<br> >>> wrong figure(s).<br> >>><br> >>>>>> The fact that we are trying to reserve the CPU add= resses in the rman is because<br> >>>>>> bus_translate_resource rewrote the start address i= n the resource after it was allocated.<br> >>>>>><br> >>>>>> That said, I can't see why rman_manage_region = would actually fail.=C2=A0 At this point the<br> >>>>>> rman is empty (this is the first call to rman_mana= ge_region for "pcib1 memory window"),<br> >>>>>> so only the check that should be failing are the c= hecks against rm_start and<br> >>>>>> rm_end.=C2=A0 For the memory window, rm_start is a= lways 0, and rm_end is always<br> >>>>>> 0xffffffff, so both the old (0xc00000000 - 0xc00ff= fff) and new (0x60000000 - 0x600fffffff)<br> >>>>>> ranges are within those bounds.<br> >>><br> >>> No:<br> >>><br> >>> 0xffffffff<br> >>><br> >>> .vs (actual):<br> >>><br> >>> 0x600000000<br> >>> 0x6000fffff<br> <br> Ok, then this explains the failure if the "raw" addresses are abo= ve 4G.=C2=A0 I have<br> access to an emag I'm currently using to test fixes to pci_host_generic= .c to<br> avoid corrupting struct resource objects.=C2=A0 I'll post the diff once= I've got<br> something verified to work.<br> <br> > It looks to me like in sys/dev/pci/pci_pci.c the:<br> > <br> > static void<br> > pcib_probe_windows(struct pcib_softc *sc)<br> > {<br> > . . .<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pcib_alloc_window(sc, &sc->me= m, SYS_RES_MEMORY, 0, 0xffffffff);<br> > . . .<br> > <br> > is just inappropriately restrictive about where in the system<br> > address space a PCIe can validly be mapped to on the high end.<br> > That, in turn, leads to the rejection on the RPi4B now that<br> > the range use is checked.<br> <br> No, the physical register in PCI-PCI bridges is only 32-bits.=C2=A0 Only th= e<br> prefetchable BAR supports 64-bit addresses.=C2=A0 This is why the host brid= ge<br> is doing a translation from the CPU side (0x600000000) to the PCI BAR<br> addresses (0xc0000000) so that the BAR addresses are down in the 32-bit<br> address range.=C2=A0 It's also true that many PCI devices only support = 32-bit<br> addresses in memory BARs.=C2=A0 64-bit BARs are an optional extension not<b= r> universally supported.<br> <br> The translation here is somewhat akin to a type of MMU where the CPU<br> addresses are mapped to PCI addresses.=C2=A0 The problem here is that the<b= r> PCI BAR resources need to "stay" as PCI addresses since we depend= on<br> being able to use rman_get_start/end to get the PCI addresses of<br> allocated resources, but pci_host_generic.c currently rewrites the<br> addresses.<br> <br> Probably I should remove rman_set_start/end entirely (Warner added them<br> back in 2004) as the methods don't do anything to deal with the fallout= <br> that the rman.rm_list linked-list is no longer sorted by address once<br> some addresses get rewritten, etc.<br></blockquote><div><br></div><div>At t= he time, they made sense. Removing it, though may take some doing</div><div= >since we use it in about 284 places=C2=A0 in sys/dev today... Somewhat mor= e</div><div>pervasive than I'd have thought they'd be...</div><div>= <br></div><div>Warner=C2=A0</div></div></div> --0000000000006a28f106115a33e8--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrX%2BJpiKgU4Bo%2B-dwX_YVszYXT9hy_S%2BxS91Z8RmiWONQ>