From owner-freebsd-arch@freebsd.org Tue Aug 2 13:25:18 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 67660BAC0FB; Tue, 2 Aug 2016 13:25:18 +0000 (UTC) (envelope-from mmel@FreeBSD.org) Received: from mail.miracle.cz (mail.miracle.cz [193.84.128.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.miracle.cz", Issuer "Miracle Group Root CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id DA3B5124B; Tue, 2 Aug 2016 13:25:17 +0000 (UTC) (envelope-from mmel@FreeBSD.org) Received: from [193.84.128.50] (meloun.ad.miracle.cz [193.84.128.50]) by mail.miracle.cz (Postfix) with ESMTPSA id 8BCE23ACC0; Tue, 2 Aug 2016 15:25:08 +0200 (CEST) Subject: Re: INTRNG (Was: svn commit: r301453....) To: Nathan Whitehorn References: <201606051620.u55GKD5S066398@repo.freebsd.org> <57976867.6080705@FreeBSD.org> <5798E104.5020104@FreeBSD.org> <579A25BB.8070206@FreeBSD.org> <30790e40-58b4-3371-c0f0-b7545571f389@freebsd.org> <579AFFC5.1040005@FreeBSD.org> <579CD355.1050203@FreeBSD.org> <460fa0b3-ddb7-6247-2412-3d75a589d5e7@freebsd.org> <579CF7C8.1040302@FreeBSD.org> <24107713-6d50-c21d-ccf1-7dbdb36cc484@freebsd.org> <579E1BE2.7020500@FreeBSD.org> <7f053bb8-ab03-e46c-1c72-d757348e4e54@freebsd.org> Cc: "freebsd-arm@freebsd.org" , Svatopluk Kraus , "freebsd-arch@freebsd.org" From: Michal Meloun X-Enigmail-Draft-Status: N1110 Message-ID: <57A09F34.4050400@FreeBSD.org> Date: Tue, 2 Aug 2016 15:25:08 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mail.miracle.cz); Tue, 02 Aug 2016 15:25:08 +0200 (CEST) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Aug 2016 13:25:18 -0000 I'm sorry for delayed response. We have serious trouble @work so I cannot spend to much time on FreeBSB for next 2 or 3 days. Dne 02.08.2016 v 6:00 Nathan Whitehorn napsal(a): > > > On 07/31/16 11:43, Nathan Whitehorn wrote: >> >> >> On 07/31/16 08:40, Michal Meloun wrote: >>> Dne 31.07.2016 v 8:11 Nathan Whitehorn napsal(a): >>> [snip] >>>>> The current API is certainly a bit of a hack and I would be >>>>> pleased to >>>>>> see it go; but the replacement needs to be more functional and more >>>>>> robust. As far as I can tell, I literally *cannot* move PowerPC over >>>>>> to this architecture. >>>>> Yes, at this time, I agree. If you can accept enhancement of >>>>> (owf_)bus_map_intr() then we can move to next, significantly less >>>>> hackish, state. >>>> OK, sure. I wrote some code this afternoon (attached) to sketch this. >>>> It does not compile or run or anything useful, but it should >>>> illustrate the important parts of what I think is an API that should >>>> work for everyone. I'm happy to finish it if it looks reasonable -- I >>>> may in any case just to replace PowerPC's increasingly teetering >>>> intr_machdep.c. >>>> >>>> The OF part is essentially unchanged from how it currently works: >>>> there's a method that you pass the interrupt specifier and interrupt >>>> parent to, and it spits back an IRQ that means that combination of >>>> things in perpetuity. OFW_BUS_MAP_INTR() in nexus.c, with its current >>>> API, can be implemented in terms of it in one line. Whenever the >>>> relevant PIC attaches, it is told to use the given IRQ to refer to >>>> that opaque bytestring. >>>> >>>> I've added a set of equivalent functions for ACPI that take the >>>> contents of an ACPI interrupt specifier and do the same things. It >>>> tries to have the IRQ be human-meaningful, if possible, by matching >>>> the ACPI interrupt number. Having separate functions seemed a little >>>> cleaner than exposing the enums and unions as part of the KPI, but I'm >>>> not wedded to it -- this is just an example. >>>> >>>> PICs register (and deregister) themselves with either an OF phandle or >>>> an ACPI resource string or (god help us) both as needed. The PICs have >>>> corresponding methods for interpreting various kinds of interrupt >>>> specifiers. >>>> >>>> Whether it allows bus_alloc_resource(), bus_setup_intr(), etc. to >>>> succeed before the PICs are attached is gated on an #ifdef. That can >>>> probably be off by default on ARM. A PowerPC version of this code >>>> would have to keep it on until various bus drivers are fixed. >>>> >>>> Here's the general flow: >>>> - Parent bus enumerates children, maps IRQs as needed, adds to >>>> resource list. The struct resources involved aren't special (just a >>>> single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are trivial to >>>> implement (and already are implemented, in general, for OF/FDT >>>> drivers). >>>> - bus_alloc_resource() does nothing special >>>> - bus_setup_intr() calls into the PIC and does what is needed, as in >>>> r301453 (to within that #ifdef) >>>> >>>> This should keep all the best pieces of everything: >>>> - ACPI and OF are supported together, and easy to extend for other >>>> types of mapping technologies without breaking either KBI or KPI (just >>>> add new functions) >>>> - No APIs change for existing Open Firmware code >>>> - The support code can live entirely in machine-dependent directories, >>>> with no code changes at all required in kern/ or dev/ofw/ >>>> - bus_setup_intr() and friends behave as in r301453 and succeed or >>>> fail for real >>>> - PCI code is not an issue >>>> - No disconnected interrupt state maintained in mapping table (unless >>>> the transitional #ifdef is on) and the mapping table content is only >>>> larger than r301453 by having a copy of the original interrupt >>>> specifier. >>>> >>>> If and when we manage to fix the kernel APIs to allow non-scalar >>>> interrupt specifiers, this should also be easy to gradually >>>> transmogrify to support that case since there exists a 1:1 mapping of >>>> scalar IRQs to rich data and the control flow would be identical: >>>> interrupt specifiers are read at enumeration time and a resulting >>>> handle -- struct resource or scalar IRQ -- used afterward to refer >>>> to it. >>>> >>> Nice, nice, i think that we are very close at this point. >>> The problem with the above workflow is that maps IRQ function is called >>> at parent bus enumeration time, generally at BUS_PASS_BUS pass. And at >>> this time, PICs are not exist. >>> Also, 'static struct intr_irqsrc *irq_sources[NIRQ];' is not a mapping >>> table, it doesn't provide this functionality. >>> But yes, i understand that mapping table is important and necessary >>> for you. >>> >>> So, if i can extend our concept a little bit, then: >>> We can add new table (map_data_tbl for example) that holds a copy of >>> the original interrupt specifier, index to irq_sources table and >>> probably some flags. >>> And we can modify the above workflow to: >>> - Parent bus enumerates children, allocates entries from map_data_tbl, >>> adds to resource list. The struct resources involved aren't special >>> (just a single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are >>> trivial to implement (and already are implemented, in general, for >>> OF/FDT drivers). Index to map_data_tbl is used as resource ID. >>> - bus_alloc_resource() sets 'ALLOCATED' in map_data_tbl flags field, >>> *maps IRQs* and stores result in 'index' field. >>> - bus_setup_intr() sets 'SETUP_DONE' in map_data_tbl flags field, calls >>> into the PIC and does what is needed, as in r301453. (to within that >>> #ifdef) >>> >>> And, in PPC case, newly attached PIC scans whole map_data_tbl table, >>> finds his entries and makes what needs. >>> >>> Does that make sense? >>> I hope that postponing of map IRQ call is not a problem for PPC, >>> everything else looks easy. >>> Moreover, this additional level of indirection also solves all my >>> 'hypothetical corner cases' with N:1 mappings (between original >>> interrupt specifier and real interrupt number). >> >> Yes, I think we are converging. >> >> My qualm about bus_alloc_resource() is twofold: >> 1. Traditionally, with interrupts, bus_alloc_resource() has no side >> effects and I'm not sure it propagates cleanly down the tree in all >> cases. >> 2. There are a few bus APIs (bus_config_intr() comes to mind) that >> use raw IRQ IDs and, so far as I know, can be, and sometimes are >> called before bus_alloc_resource(). If the PIC doesn't know about the >> IRQ yet when bus_config_intr() (etc.) is called, things will just break. >> >> So you would need to make sure that any bus method handling a >> resource ID causes it to be mapped on the PIC at that time. It's OK >> if that happens in bus_alloc_resource() so long as bus_config_intr(), >> bus_setup_intr(), etc. cause that to happen if it hasn't happened yet >> -- I don't care when these calls are made to the PIC driver so long >> as *what* calls will be made is set at enumeration time. >> >> I am also totally fine with having, on ARM, bus_config_intr(), >> bus_setup_intr() etc. fail if the PIC hasn't attached yet (On >> PowerPC, we can't do that yet, but after this conversation, I regard >> that as a bug and would fix that later), as well as delaying setup on >> the PIC to the first time any bus driver actually tries to *use* the >> interrupt (alloc_resource(), setup_intr(), config_intr(), whatever) >> rather than when the interrupt is originally allocated. >> I understand. And yes, bus_alloc_resource() isn't propagated cleanly down the tree in all cases. That's why we have 'INTRNG' hook in subr_bus.c, which is suboptimal. About bus_config_intr(). The only consumer of bus_config_intr() is ACPI code, in little hackish manner, as workaround for problem which is fully solved by INTRNG. Also, the semantic of bus_config_intr() is not clear, from INTRNG point of view. So i have tendency to ignore it :) >> ------------ The following is a large parenthesis ------------------- >> >> One other possible route here that would also work well would be to >> make ofwbus.c MD (it's a trivial piece of code anyway, so we don't >> gain a lot by sharing it) and implement ofw_bus_map_intr() locally as >> an ofwbus bus method. Then you could have the mapping table stored in >> ofwbus's softc -- the API was designed for this initially. You would >> need MD extensions for doing PIC registration there (which is fine), >> but that segregates all the OFW-specific information into >> OFW-specific code and would let the bus methods and the OFW interrupt >> mapping table interact cleanly in the same place. This still >> preserves the pre-r301453 API, makes PowerPC work, and maybe >> address's skra@'s concern about extensibility and letting core >> interrupt code know about FDT (or ACPI). I'd be happy to mock this up >> as well if you think it's a good approach. >> > [snip] >> >> This has the following features: >> - Existing OFW API and semantics unchanged >> - As such, PowerPC, PCI, etc. work fine with no changes >> - Details encapsulated in MD code, so individual platforms can >> implement this however they like >> - arm/arm/intr.c (or whatever) only needs a method to allocate a >> fresh interrupt, with no state, and anoter to set the device_t for an >> interrupt sometime later. >> - The internal table in the platform interrupt code has no knowledge >> of any mappings whatsoever except having the appropriate device_t for >> the PIC stored with the interrupt vector. >> - Device tree bits handled purely in device tree code >> - No action need be taken on any mapping until the interrupt is >> actually allocated/set up, like r301453 >> - Easy to add more mapping mechanisms (e.g. ACPI) by having similar >> enumeration-mechanism-specific code in the root bus for that mapping >> mechanism. >> >> -------------- End parenthesis ------------------------------- > > Here's an implementation of the parenthesis I wrote on an airplane > this afternoon. It should be complete, though has not been tested. The > code is short and simple (+70 lines in ofwbus.c). This preserves the > pre-r301453 API and semantics relative to drivers, which means PowerPC > and PCI work out of the box, while keeping the semantics relative to > the interrupt layer of r301453 (PIC methods only called on resource > allocation, no allocatable IRQs on unattached PICs, encapsulation of > OFW-specific code in OFW-specific bits of the tree). It turns out > those two things are compatible, somewhat to my surprise, and that > makes the result very clean. I like this approach and would be happy > to move forward with it. There are five functions of interest: > > 1. OFW_BUS_MAP_INTR(). This has the semantics and API it has now: you > pass an interrupt specifier and parent, you get back an IRQ. No > changes. This is the core of the normal OFW interrupt API. > > 2. OFW_BUS_REGISTER_PIC(device_t pic, phandle_t phandle). This is a > new function that PIC drivers are supposed to use to register control > of an interrupt domain. This replaces machine-specific code like > powerpc_register_pic() to allow the PIC table to be in a bus parent > rather than in the interrupt core. > > 3. PIC_MAP_OFW_INTR(device_t pic, int irq, interrupt specifier). This > is a new function that PIC drivers that know how to handle device-tree > interrupt descriptors implement (analogous to various existing ones > that vary by platform). It tells the PIC that the given abstract IRQ > means the given opaque interrupt specifier. > > 4. arm_allocate_irq(int suggested). This allocates a new IRQ number > not (yet) attached to a PIC, as in r301453. I've added a parameter > that lets you pass a suggested number to try in case it is possible to > make it match an interrupt pin or something for human-readability. > > 5. arm_set_pic_for_irq(int irq, device_t). This tells the MD interrupt > layer to associate a given PIC device_t with an IRQ. That is all the > information the MD layer ever has about the IRQ mapping. > > Functions #1 and #2 are now implemented completely in ofwbus.c: there > are no callouts anywhere else and the interrupt mapping table is > maintained entirely internally to ofwbus (in its softc). In order to > implement ACPI, or some other strategy, you would implement analogs to > functions #1 and #2 that live somewhere in the bus hierarchy that is > guaranteed to be above all devices that might want that type of > interrupt (e.g. in acpi0), and some analog to #3 that PIC drivers > implementing the mapping scheme would provide. > > Since the system interrupt code has no knowledge at all of interrupt > mapping, of any type, in this scheme, adding new mapping types is > trivial and can be done on a driver-by-driver basis if necessary > without changing KPI and without any other part of the system even > being aware. For example, GPIOs can use a completely different > mechanism if they want and can do setup purely in the GPIO controller > driver. You could have a method GPIO_GET_INTERRUPT_FOR_PIN() on a GPIO > controller in which the GPIO controller allocates a generic IRQ, > assigns through some internal table just in the GPIO driver, and > returns to it to a consumer in some other device driver -- without a > GPIO mapping type, new bus functions, or modifications to the platform > interrupt code. > > The control flow goes like this: > - Bus driver enumerates children, parses interrupts properties, calls > OFW_BUS_MAP_INTR() to get IRQs for them (as pre-r301453), adds to > interrupt list. > - ofwbus receives the OFW_BUS_MAP_INTR() call, allocates a blank > disconnected IRQ from the MD interrupt layer, and stores the mapping > from the new IRQ to the given interrupt specifier and phandle in an > internal table in ofwbus's softc. > NB: Nothing else happens here, like post-r301453. Changing this does > not change any semantics of the API pre-r301453, which means it > remains fully compatible with PCI and PowerPC. Also, like > post-r301453, there is no involvement of nexus. > - PICs attach and call OFW_BUS_REGISTER_PIC(). ofwbus receives these > messages and adds a (device_t, phandle_t) mapping to a second internal > table. Note that the interrupt layer does not need to handle PIC > registration anymore at all (except for the root PIC). > - Bus child eventually calls a function that tries to set the > interrupt up (e.g. bus_setup_intr()). That propagates up the bus > hierarchy, eventually getting to ofwbus. ofwbus notes the IRQ number, > looks it up in the table, looks up the appropriate PIC from the PIC > table, then: > A) calls arm_set_pic_for_irq(irq, pic_device_t) -- this is the > interrupt layer's only interaction with the mapping code. All it deals > with is device_ts and abstract IRQ numbers. > B) calls PIC_MAP_OFW_INTR(pic, irq_number, interrupt-cells, > interrupt-specifier) to tell the PIC that the interrupt layer's IRQ > irq_number means the given specifier > C) finally, passes the call onto nexus, which will do whatever would > normally happen (unmasking the interrupt, setting handlers, etc.) in > terms only of the abstract IRQ and the device_t assigned by ofwbus. > > You would implement ACPI just by doing a s/OFW/ACPI/g > search-and-replace above -- since the interrupt layer doesn't know > about OFW or ACPI or anything else, there is no need to touch it. This > seems clean, simple, compartmentalized, preserves the existing API, > and should work on all of our various hardware. PowerPC can't quite > work with it yet without some multipass foo, but, since the API is > preserved, that transition can happen gradually without KPI changes. > For the same reason that it is API-preserving, I think this code is > also MFC-able. > -Nathan I think that we are slightly diverges in this place: - please note that PIC API (in kern/pic_if.m) is different from the one that PPC uses. - we must be able to store configuration data (original interrupt specifier) in all cases, not only for OFW. That's reason why I have proposed to create 'mapping table'. Anyway, i think that we both talking about +/- same solution, only i want to move it from OFW specific level implemented at OFW bus level to bus/interrupt specifier format independent level implemented in subr_intr.c. Most of your control flow can be implemented at general level as is, or already exist [intr_map_irq(), intr_pic_register()] . Also, impact for current PPC code is, by me, minimal. I can sketch my idea in more details, if you found it acceptable. Again, I'm sorry for delayed and very brief response. Michal