Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 31 Jul 2016 11:43:23 -0700
From:      Nathan Whitehorn <nwhitehorn@freebsd.org>
To:        Michal Meloun <mmel@FreeBSD.org>
Cc:        "freebsd-arm@freebsd.org" <freebsd-arm@FreeBSD.org>, Svatopluk Kraus <skra@FreeBSD.org>, "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject:   Re: INTRNG (Was: svn commit: r301453....)
Message-ID:  <7f053bb8-ab03-e46c-1c72-d757348e4e54@freebsd.org>
In-Reply-To: <579E1BE2.7020500@FreeBSD.org>
References:  <201606051620.u55GKD5S066398@repo.freebsd.org> <57950005.6070403@FreeBSD.org> <f82018ee-51e7-60fa-2682-f0ef307a52b5@freebsd.org> <57961549.4020105@FreeBSD.org> <e2cace17-0924-2084-5fcf-626f87e41cc3@freebsd.org> <CANCZdfr%2BZ4XxXRY0yMiWXwp=8iKq54y3uJ9-OfAOdfxAs1qdtw@mail.gmail.com> <f94bfd25-fabf-efc3-55c9-cfdfd9e4d6e6@freebsd.org> <CANCZdfpz=z3gc3pyb_Qssa3vGJSnPv_r6J-SWDPPpE9zPYB9=w@mail.gmail.com> <ab44ddb1-515b-94ac-6b12-673b7c53d658@freebsd.org> <57976867.6080705@FreeBSD.org> <f2edac8f-2859-cd98-754e-881e2b2d1e63@freebsd.org> <5798E104.5020104@FreeBSD.org> <a5d43044-1733-6cc7-2e99-e85b60b0fcf3@freebsd.org> <579A25BB.8070206@FreeBSD.org> <30790e40-58b4-3371-c0f0-b7545571f389@freebsd.org> <579AFFC5.1040005@FreeBSD.org> <eb603349-eb88-866d-7a26-9e026518fd39@freebsd.org> <579CD355.1050203@FreeBSD.org> <460fa0b3-ddb7-6247-2412-3d75a589d5e7@freebsd.org> <579CF7C8.1040302@FreeBSD.org> <24107713-6d50-c21d-ccf1-7dbdb36cc484@freebsd.org> <579E1BE2.7020500@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help


On 07/31/16 08:40, Michal Meloun wrote:
> Dne 31.07.2016 v 8:11 Nathan Whitehorn napsal(a):
> [snip]
>>> The current API is certainly a bit of a hack and I would be pleased to
>>>> see it go; but the replacement needs to be more functional and more
>>>> robust. As far as I can tell, I literally *cannot* move PowerPC over
>>>> to this architecture.
>>> Yes, at this time, I agree. If you can accept enhancement of
>>> (owf_)bus_map_intr() then we can move to next, significantly less
>>> hackish, state.
>> OK, sure. I wrote some code this afternoon (attached) to sketch this.
>> It does not compile or run or anything useful, but it should
>> illustrate the important parts of what I think is an API that should
>> work for everyone. I'm happy to finish it if it looks reasonable -- I
>> may in any case just to replace PowerPC's increasingly teetering
>> intr_machdep.c.
>>
>> The OF part is essentially unchanged from how it currently works:
>> there's a method that you pass the interrupt specifier and interrupt
>> parent to, and it spits back an IRQ that means that combination of
>> things in perpetuity. OFW_BUS_MAP_INTR() in nexus.c, with its current
>> API, can be implemented in terms of it in one line. Whenever the
>> relevant PIC attaches, it is told to use the given IRQ to refer to
>> that opaque bytestring.
>>
>> I've added a set of equivalent functions for ACPI that take the
>> contents of an ACPI interrupt specifier and do the same things. It
>> tries to have the IRQ be human-meaningful, if possible, by matching
>> the ACPI interrupt number. Having separate functions seemed a little
>> cleaner than exposing the enums and unions as part of the KPI, but I'm
>> not wedded to it -- this is just an example.
>>
>> PICs register (and deregister) themselves with either an OF phandle or
>> an ACPI resource string or (god help us) both as needed. The PICs have
>> corresponding methods for interpreting various kinds of interrupt
>> specifiers.
>>
>> Whether it allows bus_alloc_resource(), bus_setup_intr(), etc. to
>> succeed before the PICs are attached is gated on an #ifdef. That can
>> probably be off by default on ARM. A PowerPC version of this code
>> would have to keep it on until various bus drivers are fixed.
>>
>> Here's the general flow:
>> - Parent bus enumerates children, maps IRQs as needed, adds to
>> resource list. The struct resources involved aren't special (just a
>> single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are trivial to
>> implement (and already are implemented, in general, for OF/FDT drivers).
>> - bus_alloc_resource() does nothing special
>> - bus_setup_intr() calls into the PIC and does what is needed, as in
>> r301453 (to within that #ifdef)
>>
>> This should keep all the best pieces of everything:
>> - ACPI and OF are supported together, and easy to extend for other
>> types of mapping technologies without breaking either KBI or KPI (just
>> add new functions)
>> - No APIs change for existing Open Firmware code
>> - The support code can live entirely in machine-dependent directories,
>> with no code changes at all required in kern/ or dev/ofw/
>> - bus_setup_intr() and friends behave as in r301453 and succeed or
>> fail for real
>> - PCI code is not an issue
>> - No disconnected interrupt state maintained in mapping table (unless
>> the transitional #ifdef is on) and the mapping table content is only
>> larger than r301453 by having a copy of the original interrupt specifier.
>>
>> If and when we manage to fix the kernel APIs to allow non-scalar
>> interrupt specifiers, this should also be easy to gradually
>> transmogrify to support that case since there exists a 1:1 mapping of
>> scalar IRQs to rich data and the control flow would be identical:
>> interrupt specifiers are read at enumeration time and a resulting
>> handle -- struct resource or scalar IRQ -- used afterward to refer to it.
>>
> Nice, nice, i think that we are very close at this point.
> The problem with the above workflow is that maps IRQ function is called
> at parent bus enumeration time, generally at BUS_PASS_BUS pass. And at
> this time, PICs are not exist.
> Also, 'static struct intr_irqsrc *irq_sources[NIRQ];' is not a mapping
> table,  it doesn't provide this functionality.
> But yes, i understand that mapping table is important and necessary for you.
>
> So, if i can extend our  concept a little bit, then:
> We can add new table (map_data_tbl for example)  that holds  a copy of
> the original interrupt specifier, index to irq_sources table and
> probably some flags.
> And we can modify the above workflow to:
> - Parent bus enumerates children, allocates entries from map_data_tbl,
> adds to resource list. The struct resources involved aren't special
> (just a single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are
> trivial to implement (and already are implemented, in general, for
> OF/FDT drivers). Index to map_data_tbl is used as resource ID.
> - bus_alloc_resource()  sets 'ALLOCATED' in map_data_tbl flags field,
> *maps IRQs* and stores result in 'index' field.
> - bus_setup_intr() sets 'SETUP_DONE' in map_data_tbl flags field, calls
> into the PIC and does what is needed, as in r301453. (to within that
> #ifdef)
>
> And, in PPC case, newly attached PIC scans whole map_data_tbl table,
> finds his entries and makes what needs.
>
> Does that make sense?
> I hope that postponing of map IRQ call is not a problem for PPC,
> everything else looks easy.
> Moreover,  this additional level of indirection also solves all my
> 'hypothetical corner cases' with N:1 mappings (between original
> interrupt specifier and real interrupt number).

Yes, I think we are converging.

My qualm about bus_alloc_resource() is twofold:
1. Traditionally, with interrupts, bus_alloc_resource() has no side 
effects and I'm not sure it propagates cleanly down the tree in all cases.
2. T. ,                                                       x n nn b 
n   /here are a few bus APIs (bus_config_intr() comes to mind) that use 
raw IRQ IDs and, so far as I know, can be, and sometimes are called 
before bus_alloc_resource(). If the PIC doesn't know about the IRQ yet 
when bus_config_intr() (etc.) is called, things will just break.

So you would need to make sure that any bus method handling a resource 
ID causes it to be mapped on the PIC at that time. It's OK if that 
happens in bus_alloc_resource() so long as bus_config_intr(), 
bus_setup_intr(), etc. cause that to happen if it hasn't happened yet -- 
I don't care when these calls are made to the PIC driver so long as 
*what* calls will be made is set at enumeration time.

I am also totally fine with having, on ARM, bus_config_intr(), 
bus_setup_intr() etc. fail if the PIC hasn't attached yet (On PowerPC, 
we can't do that yet, but after this conversation, I regard that as a 
bug and would fix that later), as well as delaying setup on the PIC to 
the first time any bus driver actually tries to *use* the interrupt 
(alloc_resource(), setup_intr(), config_intr(), whatever) rather than 
when the interrupt is originally allocated.

------------ The following is a large parenthesis -------------------

One other possible route here that would also work well would be to make 
ofwbus.c MD (it's a trivial piece of code anyway, so we don't gain a lot 
by sharing it) and implement ofw_bus_map_intr() locally as an ofwbus bus 
method. Then you could have the mapping table stored in ofwbus's softc 
-- the API was designed for this initially. You would need MD extensions 
for doing PIC registration there (which is fine), but that segregates 
all the OFW-specific information into OFW-specific code and would let 
the bus methods and the OFW interrupt mapping table interact cleanly in 
the same place. This still preserves the pre-r301453 API, makes PowerPC 
work, and maybe address's skra@'s concern about extensibility and 
letting core interrupt code know about FDT (or ACPI). I'd be happy to 
mock this up as well if you think it's a good approach.

So you would have something in arm/arm/ofwbus.c like (pseudo-code, 
missing unlocks, etc.):

struct ofwbus_softc {
     (list of PICs)
     (list of IRQ mapping from # to specifier, iparent, and device_t)
}

static int ofw_bus_map_if_unmapped(struct ofwbus_softc *sc, int irq)
{
     struct ofw_bus_irq_mapping *i;
     struct ofw_bus_pic *j;
     int err = 0;
     mtx_lock(sc->ofw_bus_irqmap_lock);
     LIST_FOREACH(i, sc->irq_mappings) {
         if (i->irq == irq)
             break;
     }

     if (i == NULL) return (ENXIO); /* PANIC? */

     if (i->dev != NULL) return (0); /* Mapped already */

     LIST_FOREACH(j, sc->ofw_bus_registered_pics) {
         if (j->phandle == i->phandle) {
             arm_intr_set_pic(irq, j->dev);
             err = PIC_SET_OFW_ISPEC(j->dev, i->ispec, i->icells);
             return (err);
         }
     }

     return (ENXIO); /* No matching PIC. Panic? */
}

static int ofw_bus_setup_intr(device_t dev, device_t child, int irq, 
blah...)
{

     int err;

     /* This section at the beginning of anything 
allocating/using/modifying interrupts */
     err = ofw_bus_map_if_unmapped(sc, irq);
     if (err != 0) /* Explode if the PIC isn't online yet
         return (err);

     bus_setup_intr(dev, child, irq, blah...); /* onward to nexus */
}

This has the following features:
- Existing OFW API and semantics unchanged
- As such, PowerPC, PCI, etc. work fine with no changes
- Details encapsulated in MD code, so individual platforms can implement 
this however they like
- arm/arm/intr.c (or whatever) only needs a method to allocate a fresh 
interrupt, with no state, and anoter to set the device_t for an 
interrupt sometime later.
- The internal table in the platform interrupt code has no knowledge of 
any mappings whatsoever except having the appropriate device_t for the 
PIC stored with the interrupt vector.
- Device tree bits handled purely in device tree code
- No action need be taken on any mapping until the interrupt is actually 
allocated/set up, like r301453
- Easy to add more mapping mechanisms (e.g. ACPI) by having similar 
enumeration-mechanism-specific code in the root bus for that mapping 
mechanism.

-------------- End parenthesis -------------------------------

> `/
>
>> Some more comments below.
>>
>>>> The ideal API would be one in which the resource enumeration code
>>>> could assign, instead of numbers, some structured information that
>>>> contains the full firmware specifier (string parent + interrupt pin
>>>> for ACPI, interrupt parent phandle + specifier for device tree, bare
>>>> numbers for simple systems).
>>> Yes, exactly. I am happy that at least we agree on something. The only
>>> difference is that we want triplet <OFW, interrupt parent phandle,
>>> specifier> or <ACPI, string parent, interrupt pin>
>> Which makes perfect sense and is a good idea.
>>
>>>> That seemed, and seems, too invasive. I think we agree on this and
>>>> have chosen to approximate that API in two different ways.
>>> I still think that we found way how we can realize the above idea in
>>> non-invasive way.
>>>   From my point of view the solution is:
>>> Put/attach the above triplet to RLE and then copy it (within
>>> bus_resource_attach()  to struct resource.
>>> But yes, I understand that I am not able to explain clearly enough.
>> I think we were just talking past eachother somehow.
>>
>> Anyway, my concern about putting this in struct resource * at
>> bus_alloc_resource() time is basically that there are a whole bunch of
>> cases in which (a) it's hard to reverse engineer which interrupt from
>> the device tree (or whatever) was meant to correspond to a particular
>> arbitrary IRQ in an allocation request unless you keep logs and (b)
>> there are a bunch of cases, mostly involving PCI, where interrupt
>> resource allocation doesn't proceed through struct resource * at all,
>> so you pretty much have to keep logs. Once you start keeping logs of
>> which IRQ corresponds to which specifier at the device level, you
>> might as well just do it once in a centralized place.
> Oki, i agree. o

Great, I'm glad we're on the same page.

>>>   gd `
>>>> When we designed the interrupt mapping code, we evaluated a large
>>>> number of possible approaches. Something very much like this was one
>>>> of them. It was rejected as being too fragile (it requires resource
>>>> allocation and enumeration to be coupled) and to have too many corner
>>>> cases  (the PCI MSI and LSI APIs, for instance). The interrupt mapping
>>>> stuff seemed at the time, and so far still seems, like the least-bad
>>>> approximation to the ideal case: it is essentially that ideal API, in
>>>> that it happens at bus enumeration and involves just assigning the
>>>> firmware specifiers, but with lookup keys to that information stuffed
>>>> into the 32-bit numbers that the rest of the system uses.
>>> The raw PCI (or MSIX)  case is relatively simple. The PCI domain uses
>>> raw numbers and it's host-to-pci bridge job to translate raw numbers
>>> from PCI domain to (for example) OFW based resource.
>>> At this point, I'm still not sure about MSI...
>> Right, you can keep a big table in the PCI driver that maps whatever
>> IRQs you are handing out to some richer interrupt specifiers and
>> consult that when you get a bus_alloc_resource() request. Which is
>> basically the approach I'm advocating, except that the table is
>> centralized, which reduces code duplication and fixes a number of
>> weird corner cases where children assign extra interrupts to
>> themselves, etc. and the bus parent has no ability to do something
>> sensible.
>>
>> Please take a look at the attached interface and see if it (or
>> something like it) meets your needs. It meets all of mine, at least,
>> and I think fixes all the things you were concerned about.
> Can you, please,  give me also example for GPIO (these are identified by
> <device_t, pin number> pair).
> Or, in other words, GPIO needs his own sets of functions?

You have more experience with GPIOs than I do, but I could imagine a 
couple different ways of doing this for GPIO interrupts not in standard 
interrupts properties:
1. You have an extra interrupt class (like r301453) called "GPIO" or 
something that either takes a OFW/ACPI handle or a device_t and a GPIO 
specifier.

2. You have some global translator function that makes an OFW/ACPI 
interrupt specifier out of a GPIO property. I don't like this one.

3. You add a method to the GPIO controller (GPIO_GET_INTERRUPT_FOR_PIN 
or something) that does something driver specific (e.g. checks if you 
can have such an interrupt, then fabricates an OF- or ACPI-compatible 
specifier and maps it) to return a usable IRQ or an error. This also 
abstracts out how you want to describe the GPIO interrupt domain.

Aside from not liking #2, I don't have strong opinions. #1 and #3 aren't 
mutually exclusive (#3 could be implemented in terms of #1), and #3 
seems a little cleaner, so I would have a weak preference for #3 as the 
canonical mechanism. But anything is OK, really.

> And thanks for your effort and patience,

Of course -- apologies for being somewhat strident. Thanks for taking 
the time to discuss this!
-Nathan

> Michal
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7f053bb8-ab03-e46c-1c72-d757348e4e54>