Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Feb 2024 12:00:16 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        FreeBSD ARM List <freebsd-arm@freebsd.org>, Current FreeBSD <freebsd-current@freebsd.org>, Warner Losh <imp@bsdimp.com>
Subject:   Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic
Message-ID:  <9AFDF067-96E4-4E67-90D2-F40DAF3F138F@yahoo.com>
In-Reply-To: <1F704317-FDB8-4BDA-8A67-61CF48794DFE@yahoo.com>
References:  <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5.ref@yahoo.com> <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5@yahoo.com> <f4326cda-c602-439f-ab03-f66b3bca5ff2@FreeBSD.org> <1F704317-FDB8-4BDA-8A67-61CF48794DFE@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[Gack: I was looking at the wrong vintage of source code, predating
your changes: wrong system used.]


On Feb 12, 2024, at 10:41, Mark Millard <marklmi@yahoo.com> wrote:

> On Feb 12, 2024, at 09:32, John Baldwin <jhb@freebsd.org> wrote:
>=20
>> On 2/9/24 8:13 PM, Mark Millard wrote:
>>> Summary:
>>> pcib0: <BCM2838-compatible PCI-express controller> mem =
0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>> pcib0: parsing FDT for ECAM0:
>>> pcib0:  PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: =
0x40000000
>>> . . .
>>> rman_manage_region: <pcib1 memory window> request: start =
0x600000000, end 0x6000fffff
>>> panic: Failed to add resource to rman
>>=20
>> Hmmm, I suspect this is due to the way that bus_translate_resource =
works which is
>> fundamentally broken.  It rewrites the start address of a resource =
in-situ instead
>> of keeping downstream resources separate from the upstream resources. =
  For example,
>> I don't see how you could ever release a resource in this design =
without completely
>> screwing up your rman.  That is, I expect trying to detach a PCI =
device behind a
>> translating bridge that uses the current approach should corrupt the =
allocated
>> resource ranges in an rman long before my changes.
>>=20
>> That said, that doesn't really explain the panic.  Hmm, the panic =
might be because
>> for PCI bridge windows the driver now passes RF_ACTIVE and the =
bus_translate_resource
>> hack only kicks in the activate_resource method of =
pci_host_generic.c.
>>=20
>>> Detail:
>>> . . .
>>> pcib0: <BCM2838-compatible PCI-express controller> mem =
0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>> pcib0: parsing FDT for ECAM0:
>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000
>>=20
>> This indicates this is a translating bus.
>>=20
>>> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0
>>> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0x1
>>> pcib0: rman_reserve_resource: start=3D0xc0000000, end=3D0xc00fffff, =
count=3D0x100000
>>> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000, =
0xc00fffff], length 0x100000, flags 102, device pcib1
>>> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xfffff>
>>> considering [0xc0000000, 0xffffffff]
>>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 (requested =
0x100000)
>>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000
>>> allocating from the beginning
>>> rman_manage_region: <pcib1 memory window> request: start =
0x600000000, end 0x6000fffff
>>=20
>> The fact that we are trying to reserve the CPU addresses in the rman =
is because
>> bus_translate_resource rewrote the start address in the resource =
after it was allocated.
>>=20
>> That said, I can't see why rman_manage_region would actually fail.  =
At this point the
>> rman is empty (this is the first call to rman_manage_region for =
"pcib1 memory window"),
>> so only the check that should be failing are the checks against =
rm_start and
>> rm_end.  For the memory window, rm_start is always 0, and rm_end is =
always
>> 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new =
(0x60000000 - 0x600fffffff)
>> ranges are within those bounds.  I would instead expect to see some =
other issue later
>> on where we fail to allocate a resource for a child BAR, but I =
wouldn't expect
>> rman_manage_region to fail.
>>=20
>> Logging the return value from rman_manage_region would be the first =
step I think
>> to see which error value it is returning.
>=20
> Looking at the code in sys/kern/subr_rman.c for rman_manage_region I =
see
> the (mostly) return rv related logic only has ENONMEM (explicit =
return) and
> EBUSY as non-0 possibilities (no other returns).

The modern code also has EINVAL via an explicit return.

> All the rv references and
> all the returns are shown below:
>=20
>        int rv =3D 0;
> . . .

The modern code also has here:

        DPRINTF(("rman_manage_region: <%s> request: start %#jx, end =
%#jx\n",
            rm->rm_descr, start, end));
        if (start < rm->rm_start || end > rm->rm_end)
                return EINVAL;

Adding rm->rm_start and rm->rm_end to the message might be
appropriate --and would also be enough additional information
to know if EINVAL would be returned.

>        r =3D int_alloc_resource(M_NOWAIT);
>        if (r =3D=3D NULL)
>                return ENOMEM;
> . . .
>                /* Check for any overlap with the current region. */
>                if (r->r_start <=3D s->r_end && r->r_end >=3D =
s->r_start) {
>                        rv =3D EBUSY;
>                        goto out;
>                }
>=20
>                /* Check for any overlap with the next region. */
>                t =3D TAILQ_NEXT(s, r_link);
>                if (t && r->r_start <=3D t->r_end && r->r_end >=3D =
t->r_start) {
>                        rv =3D EBUSY;
>                        goto out;
>                }
> . . .
> out:
>        mtx_unlock(rm->rm_mtx);
>        return rv;
>=20
> int_alloc_resource failure would be failure (NULL) from the:
>=20
>        struct resource_i *r;
>=20
>        r =3D malloc(sizeof *r, M_RMAN, malloc_flag | M_ZERO);
>=20
> (associated with the M_NOWAIT argument).
>=20
> The malloc failure would likely go in a very different direction.
>=20
>=20
>=20
> Side note: looks like the EBUSY cases leak what r references.

Still true in the newer code.

>> Probably I should fix pci_host_generic.c to handle translation =
properly however.
>> I can work on a patch for that.
>=20





=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9AFDF067-96E4-4E67-90D2-F40DAF3F138F>