Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Feb 2024 09:42:36 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Mark Millard <marklmi@yahoo.com>, FreeBSD ARM List <freebsd-arm@freebsd.org>,  Current FreeBSD <freebsd-current@freebsd.org>
Subject:   Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic
Message-ID:  <CANCZdfrX%2BJpiKgU4Bo%2B-dwX_YVszYXT9hy_S%2BxS91Z8RmiWONQ@mail.gmail.com>
In-Reply-To: <d7b20565-6aa1-486f-a197-11fbc4d0c8dd@FreeBSD.org>
References:  <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5.ref@yahoo.com> <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5@yahoo.com> <f4326cda-c602-439f-ab03-f66b3bca5ff2@FreeBSD.org> <1F704317-FDB8-4BDA-8A67-61CF48794DFE@yahoo.com> <9AFDF067-96E4-4E67-90D2-F40DAF3F138F@yahoo.com> <4C279710-5F88-4295-B1A4-7C395F3587E5@yahoo.com> <3A145420-399D-4EBD-9FF4-18924908AB1D@yahoo.com> <1298DF9C-0F82-4567-8E81-7332A608C7FC@yahoo.com> <d7b20565-6aa1-486f-a197-11fbc4d0c8dd@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000006a28f106115a33e8
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Wed, Feb 14, 2024 at 9:08=E2=80=AFAM John Baldwin <jhb@freebsd.org> wrot=
e:

> On 2/12/24 5:57 PM, Mark Millard wrote:
> > On Feb 12, 2024, at 16:36, Mark Millard <marklmi@yahoo.com> wrote:
> >
> >> On Feb 12, 2024, at 16:10, Mark Millard <marklmi@yahoo.com> wrote:
> >>
> >>> On Feb 12, 2024, at 12:00, Mark Millard <marklmi@yahoo.com> wrote:
> >>>
> >>>> [Gack: I was looking at the wrong vintage of source code, predating
> >>>> your changes: wrong system used.]
> >>>>
> >>>>
> >>>> On Feb 12, 2024, at 10:41, Mark Millard <marklmi@yahoo.com> wrote:
> >>>>
> >>>>> On Feb 12, 2024, at 09:32, John Baldwin <jhb@freebsd.org> wrote:
> >>>>>
> >>>>>> On 2/9/24 8:13 PM, Mark Millard wrote:
> >>>>>>> Summary:
> >>>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem
> 0x7d500000-0x7d50930f irq 80,81 on simplebus2
> >>>>>>> pcib0: parsing FDT for ECAM0:
> >>>>>>> pcib0:  PCI addr: 0xc0000000, CPU addr: 0x600000000, Size:
> 0x40000000
> >>>>>>> . . .
> >>>>>>> rman_manage_region: <pcib1 memory window> request: start
> 0x600000000, end 0x6000fffff
> >>>>>>> panic: Failed to add resource to rman
> >>>>>>
> >>>>>> Hmmm, I suspect this is due to the way that bus_translate_resource
> works which is
> >>>>>> fundamentally broken.  It rewrites the start address of a resource
> in-situ instead
> >>>>>> of keeping downstream resources separate from the upstream
> resources.   For example,
> >>>>>> I don't see how you could ever release a resource in this design
> without completely
> >>>>>> screwing up your rman.  That is, I expect trying to detach a PCI
> device behind a
> >>>>>> translating bridge that uses the current approach should corrupt
> the allocated
> >>>>>> resource ranges in an rman long before my changes.
> >>>>>>
> >>>>>> That said, that doesn't really explain the panic.  Hmm, the panic
> might be because
> >>>>>> for PCI bridge windows the driver now passes RF_ACTIVE and the
> bus_translate_resource
> >>>>>> hack only kicks in the activate_resource method of
> pci_host_generic.c.
> >>>>>>
> >>>>>>> Detail:
> >>>>>>> . . .
> >>>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem
> 0x7d500000-0x7d50930f irq 80,81 on simplebus2
> >>>>>>> pcib0: parsing FDT for ECAM0:
> >>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size:
> 0x40000000
> >>>>>>
> >>>>>> This indicates this is a translating bus.
> >>>>>>
> >>>>>>> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0
> >>>>>>> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0=
x1
> >>>>>>> pcib0: rman_reserve_resource: start=3D0xc0000000, end=3D0xc00ffff=
f,
> count=3D0x100000
> >>>>>>> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000,
> 0xc00fffff], length 0x100000, flags 102, device pcib1
> >>>>>>> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xffff=
f>
> >>>>>>> considering [0xc0000000, 0xffffffff]
> >>>>>>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000
> (requested 0x100000)
> >>>>>>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000
> >>>>>>> allocating from the beginning
> >>>>>>> rman_manage_region: <pcib1 memory window> request: start
> 0x600000000, end 0x6000fffff
> >>>
> >>> What you later typed does not match:
> >>>
> >>> 0x600000000
> >>> 0x6000fffff
> >>>
> >>> You later typed:
> >>>
> >>> 0x60000000
> >>> 0x600fffffff
> >>>
> >>> This seems to have lead to some confusion from using the
> >>> wrong figure(s).
> >>>
> >>>>>> The fact that we are trying to reserve the CPU addresses in the
> rman is because
> >>>>>> bus_translate_resource rewrote the start address in the resource
> after it was allocated.
> >>>>>>
> >>>>>> That said, I can't see why rman_manage_region would actually fail.
> At this point the
> >>>>>> rman is empty (this is the first call to rman_manage_region for
> "pcib1 memory window"),
> >>>>>> so only the check that should be failing are the checks against
> rm_start and
> >>>>>> rm_end.  For the memory window, rm_start is always 0, and rm_end i=
s
> always
> >>>>>> 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new
> (0x60000000 - 0x600fffffff)
> >>>>>> ranges are within those bounds.
> >>>
> >>> No:
> >>>
> >>> 0xffffffff
> >>>
> >>> .vs (actual):
> >>>
> >>> 0x600000000
> >>> 0x6000fffff
>
> Ok, then this explains the failure if the "raw" addresses are above 4G.  =
I
> have
> access to an emag I'm currently using to test fixes to pci_host_generic.c
> to
> avoid corrupting struct resource objects.  I'll post the diff once I've g=
ot
> something verified to work.
>
> > It looks to me like in sys/dev/pci/pci_pci.c the:
> >
> > static void
> > pcib_probe_windows(struct pcib_softc *sc)
> > {
> > . . .
> >          pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0xffffffff)=
;
> > . . .
> >
> > is just inappropriately restrictive about where in the system
> > address space a PCIe can validly be mapped to on the high end.
> > That, in turn, leads to the rejection on the RPi4B now that
> > the range use is checked.
>
> No, the physical register in PCI-PCI bridges is only 32-bits.  Only the
> prefetchable BAR supports 64-bit addresses.  This is why the host bridge
> is doing a translation from the CPU side (0x600000000) to the PCI BAR
> addresses (0xc0000000) so that the BAR addresses are down in the 32-bit
> address range.  It's also true that many PCI devices only support 32-bit
> addresses in memory BARs.  64-bit BARs are an optional extension not
> universally supported.
>
> The translation here is somewhat akin to a type of MMU where the CPU
> addresses are mapped to PCI addresses.  The problem here is that the
> PCI BAR resources need to "stay" as PCI addresses since we depend on
> being able to use rman_get_start/end to get the PCI addresses of
> allocated resources, but pci_host_generic.c currently rewrites the
> addresses.
>
> Probably I should remove rman_set_start/end entirely (Warner added them
> back in 2004) as the methods don't do anything to deal with the fallout
> that the rman.rm_list linked-list is no longer sorted by address once
> some addresses get rewritten, etc.
>

At the time, they made sense. Removing it, though may take some doing
since we use it in about 284 places  in sys/dev today... Somewhat more
pervasive than I'd have thought they'd be...

Warner

--0000000000006a28f106115a33e8
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Wed, Feb 14, 2024 at 9:08=E2=80=AF=
AM John Baldwin &lt;<a href=3D"mailto:jhb@freebsd.org">jhb@freebsd.org</a>&=
gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0=
px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2/=
12/24 5:57 PM, Mark Millard wrote:<br>
&gt; On Feb 12, 2024, at 16:36, Mark Millard &lt;<a href=3D"mailto:marklmi@=
yahoo.com" target=3D"_blank">marklmi@yahoo.com</a>&gt; wrote:<br>
&gt; <br>
&gt;&gt; On Feb 12, 2024, at 16:10, Mark Millard &lt;<a href=3D"mailto:mark=
lmi@yahoo.com" target=3D"_blank">marklmi@yahoo.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt;&gt; On Feb 12, 2024, at 12:00, Mark Millard &lt;<a href=3D"mailto:=
marklmi@yahoo.com" target=3D"_blank">marklmi@yahoo.com</a>&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; [Gack: I was looking at the wrong vintage of source code, =
predating<br>
&gt;&gt;&gt;&gt; your changes: wrong system used.]<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; On Feb 12, 2024, at 10:41, Mark Millard &lt;<a href=3D"mai=
lto:marklmi@yahoo.com" target=3D"_blank">marklmi@yahoo.com</a>&gt; wrote:<b=
r>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; On Feb 12, 2024, at 09:32, John Baldwin &lt;<a href=3D=
"mailto:jhb@freebsd.org" target=3D"_blank">jhb@freebsd.org</a>&gt; wrote:<b=
r>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; On 2/9/24 8:13 PM, Mark Millard wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; Summary:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; pcib0: &lt;BCM2838-compatible PCI-express cont=
roller&gt; mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; pcib0: parsing FDT for ECAM0:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; pcib0:=C2=A0 PCI addr: 0xc0000000, CPU addr: 0=
x600000000, Size: 0x40000000<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; . . .<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; rman_manage_region: &lt;pcib1 memory window&gt=
; request: start 0x600000000, end 0x6000fffff<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; panic: Failed to add resource to rman<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; Hmmm, I suspect this is due to the way that bus_tr=
anslate_resource works which is<br>
&gt;&gt;&gt;&gt;&gt;&gt; fundamentally broken.=C2=A0 It rewrites the start =
address of a resource in-situ instead<br>
&gt;&gt;&gt;&gt;&gt;&gt; of keeping downstream resources separate from the =
upstream resources.=C2=A0 =C2=A0For example,<br>
&gt;&gt;&gt;&gt;&gt;&gt; I don&#39;t see how you could ever release a resou=
rce in this design without completely<br>
&gt;&gt;&gt;&gt;&gt;&gt; screwing up your rman.=C2=A0 That is, I expect try=
ing to detach a PCI device behind a<br>
&gt;&gt;&gt;&gt;&gt;&gt; translating bridge that uses the current approach =
should corrupt the allocated<br>
&gt;&gt;&gt;&gt;&gt;&gt; resource ranges in an rman long before my changes.=
<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; That said, that doesn&#39;t really explain the pan=
ic.=C2=A0 Hmm, the panic might be because<br>
&gt;&gt;&gt;&gt;&gt;&gt; for PCI bridge windows the driver now passes RF_AC=
TIVE and the bus_translate_resource<br>
&gt;&gt;&gt;&gt;&gt;&gt; hack only kicks in the activate_resource method of=
 pci_host_generic.c.<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; Detail:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; . . .<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; pcib0: &lt;BCM2838-compatible PCI-express cont=
roller&gt; mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; pcib0: parsing FDT for ECAM0:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; pcib0: PCI addr: 0xc0000000, CPU addr: 0x60000=
0000, Size: 0x40000000<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; This indicates this is a translating bus.<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; pcib1: &lt;PCI-PCI bridge&gt; irq 91 at device=
 0.0 on pci0<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; rman_manage_region: &lt;pcib1 bus numbers&gt; =
request: start 0x1, end 0x1<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; pcib0: rman_reserve_resource: start=3D0xc00000=
00, end=3D0xc00fffff, count=3D0x100000<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; rman_reserve_resource_bound: &lt;PCIe Memory&g=
t; request: [0xc0000000, 0xc00fffff], length 0x100000, flags 102, device pc=
ib1<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; rman_reserve_resource_bound: trying 0xffffffff=
 &lt;0xc0000000,0xfffff&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; considering [0xc0000000, 0xffffffff]<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; truncated region: [0xc0000000, 0xc00fffff]; si=
ze 0x100000 (requested 0x100000)<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; candidate region: [0xc0000000, 0xc00fffff], si=
ze 0x100000<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; allocating from the beginning<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; rman_manage_region: &lt;pcib1 memory window&gt=
; request: start 0x600000000, end 0x6000fffff<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; What you later typed does not match:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; 0x600000000<br>
&gt;&gt;&gt; 0x6000fffff<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; You later typed:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; 0x60000000<br>
&gt;&gt;&gt; 0x600fffffff<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; This seems to have lead to some confusion from using the<br>
&gt;&gt;&gt; wrong figure(s).<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; The fact that we are trying to reserve the CPU add=
resses in the rman is because<br>
&gt;&gt;&gt;&gt;&gt;&gt; bus_translate_resource rewrote the start address i=
n the resource after it was allocated.<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; That said, I can&#39;t see why rman_manage_region =
would actually fail.=C2=A0 At this point the<br>
&gt;&gt;&gt;&gt;&gt;&gt; rman is empty (this is the first call to rman_mana=
ge_region for &quot;pcib1 memory window&quot;),<br>
&gt;&gt;&gt;&gt;&gt;&gt; so only the check that should be failing are the c=
hecks against rm_start and<br>
&gt;&gt;&gt;&gt;&gt;&gt; rm_end.=C2=A0 For the memory window, rm_start is a=
lways 0, and rm_end is always<br>
&gt;&gt;&gt;&gt;&gt;&gt; 0xffffffff, so both the old (0xc00000000 - 0xc00ff=
fff) and new (0x60000000 - 0x600fffffff)<br>
&gt;&gt;&gt;&gt;&gt;&gt; ranges are within those bounds.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; No:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; 0xffffffff<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; .vs (actual):<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; 0x600000000<br>
&gt;&gt;&gt; 0x6000fffff<br>
<br>
Ok, then this explains the failure if the &quot;raw&quot; addresses are abo=
ve 4G.=C2=A0 I have<br>
access to an emag I&#39;m currently using to test fixes to pci_host_generic=
.c to<br>
avoid corrupting struct resource objects.=C2=A0 I&#39;ll post the diff once=
 I&#39;ve got<br>
something verified to work.<br>
<br>
&gt; It looks to me like in sys/dev/pci/pci_pci.c the:<br>
&gt; <br>
&gt; static void<br>
&gt; pcib_probe_windows(struct pcib_softc *sc)<br>
&gt; {<br>
&gt; . . .<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pcib_alloc_window(sc, &amp;sc-&gt;me=
m, SYS_RES_MEMORY, 0, 0xffffffff);<br>
&gt; . . .<br>
&gt; <br>
&gt; is just inappropriately restrictive about where in the system<br>
&gt; address space a PCIe can validly be mapped to on the high end.<br>
&gt; That, in turn, leads to the rejection on the RPi4B now that<br>
&gt; the range use is checked.<br>
<br>
No, the physical register in PCI-PCI bridges is only 32-bits.=C2=A0 Only th=
e<br>
prefetchable BAR supports 64-bit addresses.=C2=A0 This is why the host brid=
ge<br>
is doing a translation from the CPU side (0x600000000) to the PCI BAR<br>
addresses (0xc0000000) so that the BAR addresses are down in the 32-bit<br>
address range.=C2=A0 It&#39;s also true that many PCI devices only support =
32-bit<br>
addresses in memory BARs.=C2=A0 64-bit BARs are an optional extension not<b=
r>
universally supported.<br>
<br>
The translation here is somewhat akin to a type of MMU where the CPU<br>
addresses are mapped to PCI addresses.=C2=A0 The problem here is that the<b=
r>
PCI BAR resources need to &quot;stay&quot; as PCI addresses since we depend=
 on<br>
being able to use rman_get_start/end to get the PCI addresses of<br>
allocated resources, but pci_host_generic.c currently rewrites the<br>
addresses.<br>
<br>
Probably I should remove rman_set_start/end entirely (Warner added them<br>
back in 2004) as the methods don&#39;t do anything to deal with the fallout=
<br>
that the rman.rm_list linked-list is no longer sorted by address once<br>
some addresses get rewritten, etc.<br></blockquote><div><br></div><div>At t=
he time, they made sense. Removing it, though may take some doing</div><div=
>since we use it in about 284 places=C2=A0 in sys/dev today... Somewhat mor=
e</div><div>pervasive than I&#39;d have thought they&#39;d be...</div><div>=
<br></div><div>Warner=C2=A0</div></div></div>

--0000000000006a28f106115a33e8--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrX%2BJpiKgU4Bo%2B-dwX_YVszYXT9hy_S%2BxS91Z8RmiWONQ>