Date: Wed, 14 Feb 2024 11:45:45 -0700 From: Warner Losh <imp@bsdimp.com> To: John Baldwin <jhb@freebsd.org> Cc: Mark Millard <marklmi@yahoo.com>, FreeBSD ARM List <freebsd-arm@freebsd.org>, Current FreeBSD <freebsd-current@freebsd.org> Subject: Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic Message-ID: <CANCZdfrsjfDmn8zfz9LBsjwH%2BRo5iUA6mK_6GiuVpJCrky2_XQ@mail.gmail.com> In-Reply-To: <f07d8fa2-c3e1-44f1-b355-544619ed2e5e@FreeBSD.org> References: <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5.ref@yahoo.com> <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5@yahoo.com> <f4326cda-c602-439f-ab03-f66b3bca5ff2@FreeBSD.org> <1F704317-FDB8-4BDA-8A67-61CF48794DFE@yahoo.com> <9AFDF067-96E4-4E67-90D2-F40DAF3F138F@yahoo.com> <4C279710-5F88-4295-B1A4-7C395F3587E5@yahoo.com> <3A145420-399D-4EBD-9FF4-18924908AB1D@yahoo.com> <1298DF9C-0F82-4567-8E81-7332A608C7FC@yahoo.com> <d7b20565-6aa1-486f-a197-11fbc4d0c8dd@FreeBSD.org> <CANCZdfrX%2BJpiKgU4Bo%2B-dwX_YVszYXT9hy_S%2BxS91Z8RmiWONQ@mail.gmail.com> <f07d8fa2-c3e1-44f1-b355-544619ed2e5e@FreeBSD.org>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
Hey John,
On Wed, Feb 14, 2024 at 10:30 AM John Baldwin <jhb@freebsd.org> wrote:
> On 2/14/24 8:42 AM, Warner Losh wrote:
> > On Wed, Feb 14, 2024 at 9:08 AM John Baldwin <jhb@freebsd.org> wrote:
> >
> >> On 2/12/24 5:57 PM, Mark Millard wrote:
> >>> On Feb 12, 2024, at 16:36, Mark Millard <marklmi@yahoo.com> wrote:
> >>>
> >>>> On Feb 12, 2024, at 16:10, Mark Millard <marklmi@yahoo.com> wrote:
> >>>>
> >>>>> On Feb 12, 2024, at 12:00, Mark Millard <marklmi@yahoo.com> wrote:
> >>>>>
> >>>>>> [Gack: I was looking at the wrong vintage of source code, predating
> >>>>>> your changes: wrong system used.]
> >>>>>>
> >>>>>>
> >>>>>> On Feb 12, 2024, at 10:41, Mark Millard <marklmi@yahoo.com> wrote:
> >>>>>>
> >>>>>>> On Feb 12, 2024, at 09:32, John Baldwin <jhb@freebsd.org> wrote:
> >>>>>>>
> >>>>>>>> On 2/9/24 8:13 PM, Mark Millard wrote:
> >>>>>>>>> Summary:
> >>>>>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem
> >> 0x7d500000-0x7d50930f irq 80,81 on simplebus2
> >>>>>>>>> pcib0: parsing FDT for ECAM0:
> >>>>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size:
> >> 0x40000000
> >>>>>>>>> . . .
> >>>>>>>>> rman_manage_region: <pcib1 memory window> request: start
> >> 0x600000000, end 0x6000fffff
> >>>>>>>>> panic: Failed to add resource to rman
> >>>>>>>>
> >>>>>>>> Hmmm, I suspect this is due to the way that bus_translate_resource
> >> works which is
> >>>>>>>> fundamentally broken. It rewrites the start address of a resource
> >> in-situ instead
> >>>>>>>> of keeping downstream resources separate from the upstream
> >> resources. For example,
> >>>>>>>> I don't see how you could ever release a resource in this design
> >> without completely
> >>>>>>>> screwing up your rman. That is, I expect trying to detach a PCI
> >> device behind a
> >>>>>>>> translating bridge that uses the current approach should corrupt
> >> the allocated
> >>>>>>>> resource ranges in an rman long before my changes.
> >>>>>>>>
> >>>>>>>> That said, that doesn't really explain the panic. Hmm, the panic
> >> might be because
> >>>>>>>> for PCI bridge windows the driver now passes RF_ACTIVE and the
> >> bus_translate_resource
> >>>>>>>> hack only kicks in the activate_resource method of
> >> pci_host_generic.c.
> >>>>>>>>
> >>>>>>>>> Detail:
> >>>>>>>>> . . .
> >>>>>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem
> >> 0x7d500000-0x7d50930f irq 80,81 on simplebus2
> >>>>>>>>> pcib0: parsing FDT for ECAM0:
> >>>>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size:
> >> 0x40000000
> >>>>>>>>
> >>>>>>>> This indicates this is a translating bus.
> >>>>>>>>
> >>>>>>>>> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0
> >>>>>>>>> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end
> 0x1
> >>>>>>>>> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff,
> >> count=0x100000
> >>>>>>>>> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000,
> >> 0xc00fffff], length 0x100000, flags 102, device pcib1
> >>>>>>>>> rman_reserve_resource_bound: trying 0xffffffff
> <0xc0000000,0xfffff>
> >>>>>>>>> considering [0xc0000000, 0xffffffff]
> >>>>>>>>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000
> >> (requested 0x100000)
> >>>>>>>>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000
> >>>>>>>>> allocating from the beginning
> >>>>>>>>> rman_manage_region: <pcib1 memory window> request: start
> >> 0x600000000, end 0x6000fffff
> >>>>>
> >>>>> What you later typed does not match:
> >>>>>
> >>>>> 0x600000000
> >>>>> 0x6000fffff
> >>>>>
> >>>>> You later typed:
> >>>>>
> >>>>> 0x60000000
> >>>>> 0x600fffffff
> >>>>>
> >>>>> This seems to have lead to some confusion from using the
> >>>>> wrong figure(s).
> >>>>>
> >>>>>>>> The fact that we are trying to reserve the CPU addresses in the
> >> rman is because
> >>>>>>>> bus_translate_resource rewrote the start address in the resource
> >> after it was allocated.
> >>>>>>>>
> >>>>>>>> That said, I can't see why rman_manage_region would actually fail.
> >> At this point the
> >>>>>>>> rman is empty (this is the first call to rman_manage_region for
> >> "pcib1 memory window"),
> >>>>>>>> so only the check that should be failing are the checks against
> >> rm_start and
> >>>>>>>> rm_end. For the memory window, rm_start is always 0, and rm_end
> is
> >> always
> >>>>>>>> 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new
> >> (0x60000000 - 0x600fffffff)
> >>>>>>>> ranges are within those bounds.
> >>>>>
> >>>>> No:
> >>>>>
> >>>>> 0xffffffff
> >>>>>
> >>>>> .vs (actual):
> >>>>>
> >>>>> 0x600000000
> >>>>> 0x6000fffff
> >>
> >> Ok, then this explains the failure if the "raw" addresses are above
> 4G. I
> >> have
> >> access to an emag I'm currently using to test fixes to
> pci_host_generic.c
> >> to
> >> avoid corrupting struct resource objects. I'll post the diff once I've
> got
> >> something verified to work.
> >>
> >>> It looks to me like in sys/dev/pci/pci_pci.c the:
> >>>
> >>> static void
> >>> pcib_probe_windows(struct pcib_softc *sc)
> >>> {
> >>> . . .
> >>> pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0,
> 0xffffffff);
> >>> . . .
> >>>
> >>> is just inappropriately restrictive about where in the system
> >>> address space a PCIe can validly be mapped to on the high end.
> >>> That, in turn, leads to the rejection on the RPi4B now that
> >>> the range use is checked.
> >>
> >> No, the physical register in PCI-PCI bridges is only 32-bits. Only the
> >> prefetchable BAR supports 64-bit addresses. This is why the host bridge
> >> is doing a translation from the CPU side (0x600000000) to the PCI BAR
> >> addresses (0xc0000000) so that the BAR addresses are down in the 32-bit
> >> address range. It's also true that many PCI devices only support 32-bit
> >> addresses in memory BARs. 64-bit BARs are an optional extension not
> >> universally supported.
> >>
> >> The translation here is somewhat akin to a type of MMU where the CPU
> >> addresses are mapped to PCI addresses. The problem here is that the
> >> PCI BAR resources need to "stay" as PCI addresses since we depend on
> >> being able to use rman_get_start/end to get the PCI addresses of
> >> allocated resources, but pci_host_generic.c currently rewrites the
> >> addresses.
> >>
> >> Probably I should remove rman_set_start/end entirely (Warner added them
> >> back in 2004) as the methods don't do anything to deal with the fallout
> >> that the rman.rm_list linked-list is no longer sorted by address once
> >> some addresses get rewritten, etc.
> >>
> >
> > At the time, they made sense. Removing it, though may take some doing
> > since we use it in about 284 places in sys/dev today... Somewhat more
> > pervasive than I'd have thought they'd be...
>
> Eh, I only find the one that I'm now removing:
>
> % git grep -E 'rman_set_(start|end)' sys/
> sys/dev/pci/pci_host_generic.c: rman_set_start(r, start);
> sys/dev/pci/pci_host_generic.c: rman_set_end(r, end);
> sys/kern/subr_rman.c:rman_set_start(struct resource *r, rman_res_t start)
> sys/kern/subr_rman.c:rman_set_end(struct resource *r, rman_res_t end)
> sys/sys/rman.h:void rman_set_end(struct resource *_r, rman_res_t _end);
> sys/sys/rman.h:void rman_set_start(struct resource *_r, rman_res_t
> _start);
>
> Also, I managed to boot the emag I have access to this morning. I had to
> fix a few other bugs in acpi(4) for my changes in pci_host_generic to work
> but will post reviews later today.
>
That's what happens when you grep for 'get' instead of 'set' before the
morning caffeine kicks in ..
Warner
[-- Attachment #2 --]
<div dir="ltr"><div dir="ltr">Hey John,<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 14, 2024 at 10:30 AM John Baldwin <<a href="mailto:jhb@freebsd.org">jhb@freebsd.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2/14/24 8:42 AM, Warner Losh wrote:<br>
> On Wed, Feb 14, 2024 at 9:08 AM John Baldwin <<a href="mailto:jhb@freebsd.org" target="_blank">jhb@freebsd.org</a>> wrote:<br>
> <br>
>> On 2/12/24 5:57 PM, Mark Millard wrote:<br>
>>> On Feb 12, 2024, at 16:36, Mark Millard <<a href="mailto:marklmi@yahoo.com" target="_blank">marklmi@yahoo.com</a>> wrote:<br>
>>><br>
>>>> On Feb 12, 2024, at 16:10, Mark Millard <<a href="mailto:marklmi@yahoo.com" target="_blank">marklmi@yahoo.com</a>> wrote:<br>
>>>><br>
>>>>> On Feb 12, 2024, at 12:00, Mark Millard <<a href="mailto:marklmi@yahoo.com" target="_blank">marklmi@yahoo.com</a>> wrote:<br>
>>>>><br>
>>>>>> [Gack: I was looking at the wrong vintage of source code, predating<br>
>>>>>> your changes: wrong system used.]<br>
>>>>>><br>
>>>>>><br>
>>>>>> On Feb 12, 2024, at 10:41, Mark Millard <<a href="mailto:marklmi@yahoo.com" target="_blank">marklmi@yahoo.com</a>> wrote:<br>
>>>>>><br>
>>>>>>> On Feb 12, 2024, at 09:32, John Baldwin <<a href="mailto:jhb@freebsd.org" target="_blank">jhb@freebsd.org</a>> wrote:<br>
>>>>>>><br>
>>>>>>>> On 2/9/24 8:13 PM, Mark Millard wrote:<br>
>>>>>>>>> Summary:<br>
>>>>>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem<br>
>> 0x7d500000-0x7d50930f irq 80,81 on simplebus2<br>
>>>>>>>>> pcib0: parsing FDT for ECAM0:<br>
>>>>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size:<br>
>> 0x40000000<br>
>>>>>>>>> . . .<br>
>>>>>>>>> rman_manage_region: <pcib1 memory window> request: start<br>
>> 0x600000000, end 0x6000fffff<br>
>>>>>>>>> panic: Failed to add resource to rman<br>
>>>>>>>><br>
>>>>>>>> Hmmm, I suspect this is due to the way that bus_translate_resource<br>
>> works which is<br>
>>>>>>>> fundamentally broken. It rewrites the start address of a resource<br>
>> in-situ instead<br>
>>>>>>>> of keeping downstream resources separate from the upstream<br>
>> resources. For example,<br>
>>>>>>>> I don't see how you could ever release a resource in this design<br>
>> without completely<br>
>>>>>>>> screwing up your rman. That is, I expect trying to detach a PCI<br>
>> device behind a<br>
>>>>>>>> translating bridge that uses the current approach should corrupt<br>
>> the allocated<br>
>>>>>>>> resource ranges in an rman long before my changes.<br>
>>>>>>>><br>
>>>>>>>> That said, that doesn't really explain the panic. Hmm, the panic<br>
>> might be because<br>
>>>>>>>> for PCI bridge windows the driver now passes RF_ACTIVE and the<br>
>> bus_translate_resource<br>
>>>>>>>> hack only kicks in the activate_resource method of<br>
>> pci_host_generic.c.<br>
>>>>>>>><br>
>>>>>>>>> Detail:<br>
>>>>>>>>> . . .<br>
>>>>>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem<br>
>> 0x7d500000-0x7d50930f irq 80,81 on simplebus2<br>
>>>>>>>>> pcib0: parsing FDT for ECAM0:<br>
>>>>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size:<br>
>> 0x40000000<br>
>>>>>>>><br>
>>>>>>>> This indicates this is a translating bus.<br>
>>>>>>>><br>
>>>>>>>>> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0<br>
>>>>>>>>> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0x1<br>
>>>>>>>>> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff,<br>
>> count=0x100000<br>
>>>>>>>>> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000,<br>
>> 0xc00fffff], length 0x100000, flags 102, device pcib1<br>
>>>>>>>>> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xfffff><br>
>>>>>>>>> considering [0xc0000000, 0xffffffff]<br>
>>>>>>>>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000<br>
>> (requested 0x100000)<br>
>>>>>>>>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000<br>
>>>>>>>>> allocating from the beginning<br>
>>>>>>>>> rman_manage_region: <pcib1 memory window> request: start<br>
>> 0x600000000, end 0x6000fffff<br>
>>>>><br>
>>>>> What you later typed does not match:<br>
>>>>><br>
>>>>> 0x600000000<br>
>>>>> 0x6000fffff<br>
>>>>><br>
>>>>> You later typed:<br>
>>>>><br>
>>>>> 0x60000000<br>
>>>>> 0x600fffffff<br>
>>>>><br>
>>>>> This seems to have lead to some confusion from using the<br>
>>>>> wrong figure(s).<br>
>>>>><br>
>>>>>>>> The fact that we are trying to reserve the CPU addresses in the<br>
>> rman is because<br>
>>>>>>>> bus_translate_resource rewrote the start address in the resource<br>
>> after it was allocated.<br>
>>>>>>>><br>
>>>>>>>> That said, I can't see why rman_manage_region would actually fail.<br>
>> At this point the<br>
>>>>>>>> rman is empty (this is the first call to rman_manage_region for<br>
>> "pcib1 memory window"),<br>
>>>>>>>> so only the check that should be failing are the checks against<br>
>> rm_start and<br>
>>>>>>>> rm_end. For the memory window, rm_start is always 0, and rm_end is<br>
>> always<br>
>>>>>>>> 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new<br>
>> (0x60000000 - 0x600fffffff)<br>
>>>>>>>> ranges are within those bounds.<br>
>>>>><br>
>>>>> No:<br>
>>>>><br>
>>>>> 0xffffffff<br>
>>>>><br>
>>>>> .vs (actual):<br>
>>>>><br>
>>>>> 0x600000000<br>
>>>>> 0x6000fffff<br>
>><br>
>> Ok, then this explains the failure if the "raw" addresses are above 4G. I<br>
>> have<br>
>> access to an emag I'm currently using to test fixes to pci_host_generic.c<br>
>> to<br>
>> avoid corrupting struct resource objects. I'll post the diff once I've got<br>
>> something verified to work.<br>
>><br>
>>> It looks to me like in sys/dev/pci/pci_pci.c the:<br>
>>><br>
>>> static void<br>
>>> pcib_probe_windows(struct pcib_softc *sc)<br>
>>> {<br>
>>> . . .<br>
>>> pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0xffffffff);<br>
>>> . . .<br>
>>><br>
>>> is just inappropriately restrictive about where in the system<br>
>>> address space a PCIe can validly be mapped to on the high end.<br>
>>> That, in turn, leads to the rejection on the RPi4B now that<br>
>>> the range use is checked.<br>
>><br>
>> No, the physical register in PCI-PCI bridges is only 32-bits. Only the<br>
>> prefetchable BAR supports 64-bit addresses. This is why the host bridge<br>
>> is doing a translation from the CPU side (0x600000000) to the PCI BAR<br>
>> addresses (0xc0000000) so that the BAR addresses are down in the 32-bit<br>
>> address range. It's also true that many PCI devices only support 32-bit<br>
>> addresses in memory BARs. 64-bit BARs are an optional extension not<br>
>> universally supported.<br>
>><br>
>> The translation here is somewhat akin to a type of MMU where the CPU<br>
>> addresses are mapped to PCI addresses. The problem here is that the<br>
>> PCI BAR resources need to "stay" as PCI addresses since we depend on<br>
>> being able to use rman_get_start/end to get the PCI addresses of<br>
>> allocated resources, but pci_host_generic.c currently rewrites the<br>
>> addresses.<br>
>><br>
>> Probably I should remove rman_set_start/end entirely (Warner added them<br>
>> back in 2004) as the methods don't do anything to deal with the fallout<br>
>> that the rman.rm_list linked-list is no longer sorted by address once<br>
>> some addresses get rewritten, etc.<br>
>><br>
> <br>
> At the time, they made sense. Removing it, though may take some doing<br>
> since we use it in about 284 places in sys/dev today... Somewhat more<br>
> pervasive than I'd have thought they'd be...<br>
<br>
Eh, I only find the one that I'm now removing:<br>
<br>
% git grep -E 'rman_set_(start|end)' sys/<br>
sys/dev/pci/pci_host_generic.c: rman_set_start(r, start);<br>
sys/dev/pci/pci_host_generic.c: rman_set_end(r, end);<br>
sys/kern/subr_rman.c:rman_set_start(struct resource *r, rman_res_t start)<br>
sys/kern/subr_rman.c:rman_set_end(struct resource *r, rman_res_t end)<br>
sys/sys/rman.h:void rman_set_end(struct resource *_r, rman_res_t _end);<br>
sys/sys/rman.h:void rman_set_start(struct resource *_r, rman_res_t _start);<br>
<br>
Also, I managed to boot the emag I have access to this morning. I had to<br>
fix a few other bugs in acpi(4) for my changes in pci_host_generic to work<br>
but will post reviews later today.<br></blockquote><div><br></div><div>That's what happens when you grep for 'get' instead of 'set' before the morning caffeine kicks in ..</div><div><br></div><div>Warner</div></div></div>
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrsjfDmn8zfz9LBsjwH%2BRo5iUA6mK_6GiuVpJCrky2_XQ>
