Date: Sat, 30 Jul 2016 09:59:03 -0700 From: Nathan Whitehorn <nwhitehorn@freebsd.org> To: Michal Meloun <mmel@freebsd.org> Cc: "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>, Svatopluk Kraus <skra@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: INTRNG (Was: svn commit: r301453....) Message-ID: <460fa0b3-ddb7-6247-2412-3d75a589d5e7@freebsd.org> In-Reply-To: <579CD355.1050203@FreeBSD.org> References: <201606051620.u55GKD5S066398@repo.freebsd.org> <57934ABD.6010807@FreeBSD.org> <4e7a3e8f-cc21-f5f2-e3e0-4dbd554a4cd0@freebsd.org> <5794720F.4050303@FreeBSD.org> <8bfd8668-bc49-e109-e610-b5cd470be3ec@freebsd.org> <57950005.6070403@FreeBSD.org> <f82018ee-51e7-60fa-2682-f0ef307a52b5@freebsd.org> <57961549.4020105@FreeBSD.org> <e2cace17-0924-2084-5fcf-626f87e41cc3@freebsd.org> <CANCZdfr%2BZ4XxXRY0yMiWXwp=8iKq54y3uJ9-OfAOdfxAs1qdtw@mail.gmail.com> <f94bfd25-fabf-efc3-55c9-cfdfd9e4d6e6@freebsd.org> <CANCZdfpz=z3gc3pyb_Qssa3vGJSnPv_r6J-SWDPPpE9zPYB9=w@mail.gmail.com> <ab44ddb1-515b-94ac-6b12-673b7c53d658@freebsd.org> <57976867.6080705@FreeBSD.org> <f2edac8f-2859-cd98-754e-881e2b2d1e63@freebsd.org> <5798E104.5020104@FreeBSD.org> <a5d43044-1733-6cc7-2e99-e85b60b0fcf3@freebsd.org> <579A25BB.8070206@FreeBSD.org> <30790e40-58b4-3371-c0f0-b7545571f389@freebsd.org> <579AFFC5.1040005@FreeBSD.org> <eb603349-eb88-866d-7a26-9e026518fd39@freebsd.org> <579CD355.1050203@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 07/30/16 09:18, Michal Meloun wrote: > Dne 29.07.2016 v 16:41 Nathan Whitehorn napsal(a): > [snip] > Svata is online now, so i think that he is more competent for responding > to all questions about INTRNG internals. > > >>>>> But, as you wrote, INTRNG code is "absolutely free to map and >>>>> dispatch >>>>> multiple virtual IRQs from the same shared interrupt pin" (allow me to >>>>> expand this also to INTRNG code itself) . >>>>> In attempt to simply situation abound cache, in first step, we >>>>> removed >>>>> duplicates check from above model. This also removes unpleasantly >>>>> sharing of entries. >>>>> This step converts cache to some sort of duplicate of resource list >>>>> entries. Each existing RLE have exactly one corresponding entry in >>>>> cache. >>>>> Also, the cache is not needed for INTRNG core. At this point, its >>>>> only >>>>> transfers data from ofw_bus_intr_to_rl() to consumers PIC. >>>>> >>>>> So, if we are able to attach 'key' to RLE, then we can remove the >>>>> whole >>>>> cache machinery from INTRNG. >>>>> Do you agree? >>>> No. Because you have to support two things that this still can't allow: >>>> 1. Devices attaching before their interrupt controllers. >>> You still expect that data readed from OFW must be parsed at RLE >>> creation time. I don't see single reason why we cannot postpone >>> parsing to bus_alloc_resource() time. >>> At this point, its irrelevant if bus_alloc_resource() supports >>> detached PIC or not. >> Because there are *lots* of cases in the kernel where interrupts are >> represented as a number and not a struct resource *. I've listed many >> of them; I'm sure there are more. PCI is the most notable example >> (both LSI and MSI). For both of these, the route_interrupt (LSI) and >> alloc_msi[x]() (MSI) functions give the bus driver a chance to look at >> the device tree and then they return an IRQ number. There doesn't seem >> to be any way to map that back to a struct resource * or something >> besides an internal table. >> >> Moreover, there are cases (self-added interrupts by children, for >> instance) in which the RL assigned by the parent does not reflect the >> final RL used by the child and there is no way for the parent bus to >> map an RID to a device tree entry. >> >> When we moved to doing this at interrupt assignment time rather than >> resource allocation time or another occasion on PowerPC, it was for >> these kinds of reasons. There were ways to do it later, but they were >> hugely invasive and involved changing large parts of the kernel. Even >> then, it would still be fragile: Trying to guess what device tree >> entry was meant at bus_alloc_resource() time is much more error-prone >> than doing the mapping when reading that device tree to assign the >> resources in the first place, when you can make guarantees. >> >>>> 2. Interrupts encoded entirely in a 32-bit integer rather than a >>>> struct resource * in a shared rman. This is required to support PCI >>>> (both LSI and MSI). >>>> >>>> #2 could of course be fixed by completely retooling the API of the PCI >>>> bus code, but, since it fixes something that currently is not a >>>> problem, why would we do that? >>> All relevant bus functions have struct resource * as an argument. And, >>> in post r301453 kernel, all necessary data are attached to it. >>> I don't see any reason for API change. >> There is: >> int PCIB_ROUTE_INTERRUPT(device_t, device_t, pin) >> >> Which takes a PIN (A, B, C, D) and returns the IRQ to add to the RL. >> This is the only real opportunity to parse an interrupt-map and does >> not deal with struct resource. >> >> There are also PCIB_ALLOC_MSI[X], which take a device_t and count and >> return a vector of IRQ numbers. >> >> How would you do resource allocation given these APIs? There seem to >> be some dubious-looking attempts at generic MSI mapping code in the >> new kern/subr_intr.c that don't allow the specification of more than >> 32-bit interrupt specifiers and otherwise exactly duplicate the >> pre-301453 flow for MSIs, but in a more complicated and >> less-functional way. >> >> (While investigating this, I also just found >> dev/pci/pci_host_generic.c, which mostly does the wrong things, is >> actually specific to a couple of ARM boards, and duplicates code in >> dev/ofw/ofwpci.c. But that's another discussion...) > [snip][2] >>>>>>> Each OFW device has a lot of dependencies. It must enable clocks, >>>>>>> power >>>>>>> regulators..., it must be taken from reset. All these action must be >>>>>>> done before we can access single register. >>>>>>> Most of these action can be done only with attached >>>>>>> clock/power/reset >>>>>>> controller. >>>>>>> Within this, interrupts are not special. >>>>>> They are actually quite special in that the kernel enables them late >>>>>> and so you can defer setup. They are also special in that, for that >>>>>> reason, it is possible to design systems with circular dependency >>>>>> graphs with interrupts. It is not possible to do this with, for >>>>>> example, clocks: if I need to apply a clock to the clock controller >>>>>> with itself, the system is just not bootable and such a system will >>>>>> not be built or shipped. Interrupts need only happen later, after >>>>>> device setup, and so such systems *are* designed and shipped. The >>>>>> same >>>>>> is true for power. >>>>> The G5 problem is standard 'cross type' circular dependency. You >>>>> must >>>>> (at BUS_PASS_BUS) initialize memory address window of PCI bridge and >>>>> start bus enumeration. >>>>> Rest of PCI bridge initialization (including all IRQ stuff) must be >>>>> postponed to any later pass. Of course, all interrupt controllers must >>>>> be attached at BUS_PASS_INTERRUPT. >>>>> Or I missed something? >>>> Yes. As I said, you could solve it that way. It would, of course, >>>> require a massive retooling of newbus (to allow partial delayed >>>> attach), everything in dev/pci (which doesn't expect that), and >>>> (almost) every PCI driver in and out of the tree. >>> I still think that only host to PCI bridge must me modified, and PCI-PCI >>> bridge must be run at BUS_PASS_BUS (which is single line modification). >> On Powermacs, one PIC is 5 levels deep in the bus hierarchy, under 3 >> PCI<->PCI bridges and a device that also has an ATA controller and a >> bunch of other things. > Yes, I see. And, as a said before, it's pretty common situation in OFW > based world. > Imho, all whats is needed is: > https://github.com/strejda/freebsd/commit/96ab64e04dcebb131643c2ae1cf373194d23e8e5 > (of course, it's not tested, I havn't access to powermac HW) That would probably work on at least some of the hardware. Some of those parents can have their own interrupts, however, which would break. I'll investigate how often this happens. > It's hard for me to understand why you say that this change requires > "massive retooling of newbus, everything in dev/pci, every PCI driver). > >> Some of those can have their own interrupts (which is fairly common >> for error reporting or PCI-E hotplug). It wouldn't be impossible, but >> it is decidedly nontrivial. > That's true. Any 'bus' driver, in multipass environment, *must* export > provided resources, make bus accessible and enumerate it at > 'BUS_PASS_BUS', and > consume resources at some later bus pass. > At this time, only pcie-hotplug needs this change. But I don't see why > a few lines change is "decidedly nontrivial". It's not just a few lines change. Newbus provides no mechanism for a bus to attach at two different bus passes. >>>> Or: you could skip all of that and use the fully functional mechanism >>>> that already exists. >>> The problem is, at least from my point of view, that we don't have it. >>> The actual MIPS code doesn't work on ARM for numerous reasons. This is >>> first one -> >>> >>> https://svnweb.freebsd.org/base/head/sys/powerpc/powerpc/nexus.c?revision=297000&view=markup#l186 >>> The pre-r301453 ARM INTRNG doesn't support detached interrupts. >>> And r301453 doesn't changed nothing about this. >> I'm happy to write whatever code is missing. The ARM implementation of >> ofw_bus_map_intr() does seem fairly braindead and should be replaced. >> >> So here is where I am right now: >> - The perceived advantage of the new API seems to be that the mapping >> information (interrupt parent, etc.) ends up in a struct resource >> instead of in a centralized mapping table > True. And, in optimal case, also in RLE. See [1]. Your code also has a centralized table (static struct intr_irqsrc *irq_sources[NIRQ];), basically the same one, so I'm actually really confused on this point. The only actual difference seems to be that the firmware interrupt descriptor is stored in the RLE at bus_alloc_resource() time instead of in the table at bus enumeration time and that the new code requires some extra bus methods. >> - Additionally, it gives you a better shot at having the PIC online >> before the PIC's interrupts are parsed (which is not required, but nice). > Nope, we simply fount that detached pics is very dangerous and not > needed. But, if you love it, it can be added as MD extension. Dangerous how? I don't "love it"; it's an unfortunate necessity. >> - Beyond these aesthetic points, there are no concrete examples of new >> functionality added by this API, aside from some minor implementation >> bugs of the old one on ARM that are easily fixed. > Really? Or you just ignore it? Which ones? I've been asking for a week and a half now and you have responded with some hypothetical examples, all of which either (a) do not seem to occur in the real world and (b) were trivially implementable with the existing code. > >> - There are, conversely, a number of concrete cases where this new API >> would not be able to do the right thing. Some of these can be worked >> around or fixed with significant restructuring in the MI parts of the >> kernel. > And i offer you a fix, without "significant restructuring in the MI > parts of the kernel". Unfortunately, you did not want to accept anything > other than the old API. For one of them, partially. For the others, not so much. For example, I have no idea how to implement PCI interrupts (PCIB_ROUTE_INTERRUPT) in this framework. I am perfectly happy to accept a new API. What I am not perfectly happy to accept is an API that makes it impossible for me to support the platforms that I need to support and that, simultaneously, doesn't even provide any new capabilities on other platforms. The current API is certainly a bit of a hack and I would be pleased to see it go; but the replacement needs to be more functional and more robust. As far as I can tell, I literally *cannot* move PowerPC over to this architecture. The ideal API would be one in which the resource enumeration code could assign, instead of numbers, some structured information that contains the full firmware specifier (string parent + interrupt pin for ACPI, interrupt parent phandle + specifier for device tree, bare numbers for simple systems). That seemed, and seems, too invasive. I think we agree on this and have chosen to approximate that API in two different ways. When we designed the interrupt mapping code, we evaluated a large number of possible approaches. Something very much like this was one of them. It was rejected as being too fragile (it requires resource allocation and enumeration to be coupled) and to have too many corner cases (the PCI MSI and LSI APIs, for instance). The interrupt mapping stuff seemed at the time, and so far still seems, like the least-bad approximation to the ideal case: it is essentially that ideal API, in that it happens at bus enumeration and involves just assigning the firmware specifiers, but with lookup keys to that information stuffed into the 32-bit numbers that the rest of the system uses. >> - If we have both, the interrupt handling code in the MI parts of the >> kernel will bifurcate. This patch alone has added a bunch of #ifdef to >> kern/subr_bus.c, a new file to kern/, and lots of #ifdef to dev/ofw. >> This is going to be really hard to maintain if we need both. > All those #ifdefs has been added as safeguards for 11/stable and they > can be removed immediately. They should not be and I will request immediate reversion and appeal to core@ if they are before this discussion is complete. -Nathan > Michal > >> Is this all correct? >> >> If so, can we please back this out until this discussion is complete? >> I'm asking this formally at this point, under the Committer's Guide >> section about reversion requests. >> -Nathan > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?460fa0b3-ddb7-6247-2412-3d75a589d5e7>