Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Jul 2016 12:45:17 +0200
From:      Michal Meloun <mmel@FreeBSD.org>
To:        Nathan Whitehorn <nwhitehorn@freebsd.org>, Svatopluk Kraus <skra@FreeBSD.org>, src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, svn-src-head@FreeBSD.org
Subject:   Re: svn commit: r301453 - in head/sys: arm/arm arm64/arm64 dev/fdt dev/gpio dev/iicbus dev/ofw dev/pci dev/vnic kern mips/mips sys
Message-ID:  <57934ABD.6010807@FreeBSD.org>
In-Reply-To: <9d2a224c-b787-2875-5984-a7a2354e8695@freebsd.org>
References:  <201606051620.u55GKD5S066398@repo.freebsd.org> <b9606755-69cb-2cb0-04d7-6be32e4cb89e@freebsd.org> <578E0B5D.3070105@FreeBSD.org> <e026f6fc-76ed-5dbe-00fc-365b6d7bcf94@freebsd.org> <578F6075.7010500@FreeBSD.org> <05a80ac6-4285-ec9d-36e9-2f92c609f746@freebsd.org> <57907B0F.9070204@FreeBSD.org> <9d2a224c-b787-2875-5984-a7a2354e8695@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Dne 21.07.2016 v 17:53 Nathan Whitehorn napsal(a):
>
>
> On 07/21/16 00:34, Michal Meloun wrote:
>> Dne 20.07.2016 v 17:45 Nathan Whitehorn napsal(a):
>>>
>>>
>>> On 07/20/16 04:28, Michal Meloun wrote:
>>>> Dne 19.07.2016 v 17:06 Nathan Whitehorn napsal(a):
>>>>>
>>>>>
>>>>> On 07/19/16 04:13, Michal Meloun wrote:
>>>>>> Dne 19.07.2016 v 2:11 Nathan Whitehorn napsal(a):
>>>>>> Hi Nathan,
>>>>>> I’m afraid that skra is on vacation, for next 2 weeks (at
>>>>>> minimum), so
>>>>>> please don’t expect quick response.
>>>>>>
>>>>>>> Could you please describe what this change is in more detail?
>>>>>> Short description is appended.
>>>>>>
>>>>>>> It breaks a lot of encapsulations we have worked very hard to
>>>>>>> maintain,
>>>>>>> moves ARM code into MI parts of the kernel, and the OFW parts
>>>>>>> violate
>>>>>>> IEEE 1275 (the Open Firmware standard). In particular, there is no
>>>>>>> guarantee that the interrupts for a newbus (or OF) device are
>>>>>>> encoded in
>>>>>>> a property called "interrupts" (or, indeed, in any property at
>>>>>>> all) on
>>>>>>> that node and there are many, many device trees where that is
>>>>>>> not the
>>>>>>> case (e.g. ones with interrupt maps, as well as Apple hardware). By
>>>>>>> putting that knowledge into the OF root bus device, which we
>>>>>>> have tried
>>>>>>> to keep it out of, this enforces a standard that doesn't
>>>>>>> actually exist.
>>>>>> Imho, this patch doesn’t change anything in this area. Only
>>>>>> handling of
>>>>>> “interrupts” property is changed, all other cases are unchanged (I
>>>>>> hope).  Also, INTRNG code is currently shared by ARM, ARM64 and
>>>>>> MIPS.
>>>>>
>>>>> But "interrupts" isn't a generic part of OF. This makes it one,
>>>>> incorrectly.
>>>> How? Can you be little more exact ?
>>>
>>> Because it puts knowledge into ofwbus that expects that children at
>>> arbitrary levels of nesting have interrupts defined by an
>>> "interrupts" property. You could patch this through on sub-devices,
>>> of course, but that's already done correctly by the existing
>>> ofw_bus_map_intr() code in a much more robust way that doesn't
>>> involve trying to guess how sub-buses and devices have chosen to
>>> allocate resources. Why reinvent the wheel all the way through the
>>> bus hierarchy?
>>  Nope, the code only expect that „interrupts" property is default way
>> hot to get interrupt description.  Any device or bus in the hierarchy
>> can fill appropriate resource list, or terminate call at any level.
>
> "interrupts" should not be the default -- it's part of the OF bindings
> for the bus and is used (notably) by simplebus. The issue of
> cross-correlating RIDs is a much larger problem, however.
>
Can we postpone this problem while, please?
> [snip]
>>>
>>>>>>
>>>>>> The patch simply postpones reading of interrupt property to
>>>>>> bus_alloc_resource() (called by consumer driver) time.
>>>>>>
>>>>>> Due to this, we can:
>>>>>> - parse  interrupt property. The interrupt driver must exist
>>>>>>    at this time.
>>>>>
>>>>> This only works with some types of interrupt properties, not all,
>>>>> and breaks if the interrupt driver hasn't attached yet (which it
>>>>> can't be guaranteed to -- some PPC systems have interrupt drivers
>>>>> that live on the PCI bus, for example).
>>>> How you can allocate (and reserve it in rman) interrupt if is not
>>>> mapped (so you have not real IRQ number for it). Just for notice - 
>>>> multiple virtual IRQs can be mapped into single real IRQ.
>>>
>>> The core idea is to think of the full interrupt specifier -- the
>>> interrupt parent and the full byte string in the device tree -- as
>>> the IRQ rather than the interrupt pin on some chip (which is
>>> usually, but not always, the first word in that byte string). The
>>> "virtual" IRQ number is just a compression of that longer piece of
>>> data, which usually can't fit in an rman resource.
>> I understand.  While this approach can works (and actually works) for
>> single sourced OFW world, it immediately fails if you must be able to
>> parse data from different sources (which uses different encoding)
>> (OFW, UEFI / ACPI, GPIO...) within one system.
>
>
> On PowerPC, GENERIC64 supports FDT systems (some IBM hardware), OFW
> systems (Macs, some IBM hardware), systems with no device trees at all
> (old-style PS3), and systems with a mixture of device tree and no
> device tree (new-style PS3). On these, there is a mixture of "real"
> interrupts and GPIO-type interrupts. There is no limitation that this
> be used only for device-tree-type systems.
>
> The system requires two things: an interrupt domain key and an
> arbitrary unique byte string describing the interrupt. When running
> with a device tree, these are set to the phandle of the
> interrupt-parent and the contents of the device tree interrupt
> specifier, respectively, and the system was of course developed with
> that in mind. But they don't need to be, and often aren't. You could
> make the domain an element of an enum (where "ACPI" is a choice, for
> instance -- this is what PS3 does), or set it to a pointer to a
> device_t, or really anything you like. Similarly, the interrupt
> specifier is totally free-form.

Yes, I agree. and i think that we followed the same direction. But i see
two problems with this approach.
- in some cases (OFW, device_t)  you don't have  control over domain key
value, so you cannot guarantee its uniqueness.
  Of course, probability of collision is low, but it is.
- within ofw_bus_map_intr() (or later - at the time when byte string
must be decoded)  you are not able (easily) to differentiate
  between different formats, thus you are not able to select
appropriate  decoder. The GPIO controller, for example,
  must be able to handle interrupts defined by standard OFW property, or
by <device_t, pin number> pair concurrently.

For this reason we makes domain key composite, in our model, the domain
key consist of "domain key type"
and "domain key value". This composite key guarantees uniqueness and  it
also allows to select proper parser for byte string.
This is, imho, only one difference between us.

> You could, for instance, set it to one of the structures introduced in
> r301453 if you wanted to.
>
> I would have zero problems with changing the prototype of the existing
> dev/ofw function to something more generic in name, like:
>
> bus_map_intr(device_t dev, uintptr_t iparent, size_t intrlen, void *intr)
>
> instead of the existing equivalent:
>
> ofw_bus_map_intr(device_t dev, phandle_t iparent, int icells, pcell_t
> *intr)
>
Our bus_map_intr() method is not indeed as replacement of 
ofw_bus_map_intr(). Its evolution of "how we can store more complex data
to resource list (from bus enumerator) and transfer it  to
bus_allocate_resource() and/or to bus_setup_intr()" in driver
independent way. We found no reasonable way to do it, so we postponed
reading of properties to bus_allocate_resource() time.

But now  jhb@ gives us alternative and I must say that this looks like a
clean and elegant way how to make this (assuming that we can expand
resource_list_entry by pointer to alloc_resource_args)

>> By this, one byte string in OFW encoding can describe one IRQ and
>> exactly same string in UEFI encoding can describe different IRQ.  Or,
>> in reverse, OFW and UEFI can describe same (and compatible) IRQ by
>> two different strings.
>> This is exact reason, why we discards virtual IRQ idea and I think
>> that this fact is root issue of this clash.
>> Probably it doesn't make sense to talk about others, unless we can
>> find consensus on this.
>
> You have the larger problem if you end up in this situation that you
> are enumerating the hardware by two different and incompatible
> techniques. There simply is no way to solve this unless you either (a)
> segregate the system into an ACPI-enumerated domain and an
> OF-enumerated domain, in which case the problem vanishes, (b) discard
> one enumeration, which is what arm64 does and will always do,
> according to Andrew in another post, or (c) make some incredibly
> complex merging code that would naturally handle interrupts with
> everything else. So I don't think this is an actual, real problem.
>
I think that above proposed solution resolves this gracefully.


> In circumstances where you have a nested, non-device-tree hierarchy
> (e.g. OPAL on PowerNV or GPIOs or what-have-you), this kind of problem
> is easily solved by inventing an extra interrupt domain.
>
>>
>>> There is no need to actually activate those interrupts before
>>> interrupts are enabled, so you can just cache them in a table until
>>> the end of device probing, which lets you break circular dependency
>>> loops between bus and interrupt topology.
>>>
>>> So long as you keep track of your mapping and the same (parent,
>>> interrupt specifier) parent always gives the same virtual IRQ, there
>>> is no way in this system to map multiple active IRQs onto a single
>>> interrupt pin on the PIC unless your device tree is broken and
>>> specifies two devices with incompatible modes (active high and edge
>>> downgoing or something) on the same pin. In this case, nothing you
>>> can do will save you -- unless your PIC supports interrupts for
>>> different kinds of events, in which case this system will work
>>> perfectly by treating them as different interrupts to the kernel for
>>> which the fact they are on the same pin is immaterial.
>>>
>>> I should note that ARM and MIPS have an almost complete
>>> implementation of this already: maybe some more intr_machdep.c logic
>>> is needed for some cases, but all the rest of the plumbing is there.
>>>
>>>>
>>>>>
>>>>>> - bus_alloc_resource() returns resource, so we can attach parsed
>>>>>>    interrupt data to it. By this, the resource itself can be used
>>>>>>    for delivering configuration data to subsequent call to
>>>>>>    bus_setup_intr() (or to all related  bus_<foo>() calls).
>>>>>>
>>>>>>
>>>>>> The patched code still accepts delivering of interrupts in
>>>>>> resource list.
>>>>>>
>>>>>> Michal
>>>>>>
>>>>>
>>>>> Given that other code depends on this, fixing it will likely
>>>>> require some complex work. I wish I had known about it when it
>>>>> went in.
>>>>>
>>>>> There are three main problems:
>>>>> 1. It doesn't work for interrupts defined by other mechanisms
>>>>> (e.g. interrupt-map properties)
>>>> I aggree, but missing ' interrupt-map' functioanlity is not caused
>>>> by this patch.
>>>
>>> It is in that the standard system already implements it completely.
>> Really?
>> https://svnweb.freebsd.org/base/head/sys/dev/ofw/ofw_bus_subr.c?revision=301453&view=markup#l521
>> and
>> https://svnweb.freebsd.org/base/head/sys/arm/arm/nexus.c?revision=301453&view=markup#l411
>
> That function is questionable and I objected to it at the time; it is
> meant only as a convenience for simplebus. More complicated cases,
> like ofwpci.c, use interrupt-map and have for a very long time.
> Simplebus allows interrupt-map but support has never been added, which
> is a bug. One of the major problems I have with this patch is that
> adding it would now require parsing all of that in two places, and
> cross-correlating them with questionable chance of success, rather
> than in just one.
>
> FreeBSD/PowerPC systems have relied on interrupt-map for ten years
> now. It's not especially exotic.
>
>>>
>>>>  
>>>>> 2. It partially duplicates the functionality of
>>>>> OFW_BUS_MAP_INTR(), but is both problematically more general and
>>>>> less flexible (it has requirements on timing of PIC attachment vs.
>>>>> driver resource allocation)
>>>> OFW_BUS_MAP_INTR()  can parse only OFW  based data and expect that
>>>> parsed data are magicaly stored within the call.
>>>> The new method, bus_map_intr(),  can parse data from multiple
>>>> sources  (OFW, UEFI / ACPI, synthetic[gpio device + pin number]). 
>>>> It also returns parsed data back to caller.
>>>
>>> That is not true. It works as long as you can specify the interrupt
>>> state as a 32-bit key of some kind for the PIC and a string of
>>> arbitrary data, which works with all of those. You could even make
>>> the interrupt data be a pointer to exactly the structs you have
>>> chosen to define here.
>> Nope, in heterogeneous world, same string can describe two different
>> IRQs and/or two different strings can describe single IRQ in
>> compatible manner.
>
> Can you give *any* concrete example of this that doesn't involve mixed
> ACPI/FDT enumeration of a single system where devices appear in both
> trees, which doesn't actually ever happen?
GPIO -  its interrupt function can be accessed by regular "interrupts"
property,  or it can be derived from GPIO pin.

>
> And even in those cases, there is no problem, since the PIC driver can
> just dispatch both (or more) vectors corresponding to the interrupt
> pin when the configured interrupt fires. It would have to have special
> logic to handle decoding unrelated types of interrupt specifiers;
> adding that would be about 3 lines of code.
>
>>
>>>
>>>> And no, it  doesn't  add any additional timing requirements .
>>>
>>> As far as I can tell, it requires the interrupt controller to be
>>> attached before you can allocate interrupts. Is that not true?
>> Yes, sure. And the patch doesn't change this.
>> Before: 
>> https://svnweb.freebsd.org/base/head/sys/kern/subr_intr.c?view=markup&pathrev=301263#l1103
>> After:
>> https://svnweb.freebsd.org/base/head/sys/kern/subr_intr.c?view=markup&pathrev=301543#l928
>
> On PowerPC, we don't require this and never have. The VIRQ stuff is
> meant explicitly to not require this.
>
>>>
>>>>
>>>>> 3. It is not fully transparent to end code. Since it happens at
>>>>> bus_alloc_resource() time, it is complicated to get the
>>>>> appropriate values for IRQs constructed by composite techniques
>>>>> (interrupt-map vs. interrupts vs. hand allocation vs. PCI routing,
>>>>> for example).
>>>> I don't see any limitation - can you be more exact? Why is not
>>>> transparent? Why is more complicated ?
>>>
>>> Suppose that a PCI device adds more IRQs to its resource list or
>>> modifies the ordering. How is whatever bus layer supposed to do
>>> something sensible at allocation time? It requires that RID numbers
>>> mean something to the parent bus after assignment, which is not
>>> guaranteed by anything and is, in more than handful of cases I think
>>> of, not true in practice.
>> Sure. And since the new code allows delivering resources in RL, so I
>> don't see any limitation here.
>
> It indexed mapping by RID and then searches interrupt lists by that to
> get the interrupt-parent. This is fundamentally a broken design if the
> child needs to, say, add a second interrupt to its RL on a different
> interrupt-parent.
?? I don't understand. The new code doesn't need this.
 
>
>>
>>>
>>>>> It is much easier to do this correctly at bus attach time when the
>>>>> resource lists are made (how PPC does it).
>>>>>
>>>> I don't agree. I don't agree. Making this at bus attach time leads
>>>> into complicated 'virtual' IRQ infrastructure, with many unresolved
>>>> corner cases.
>>>
>>> Which unresolved corner cases? This has been working correctly on a
>>> number of platforms in both FreeBSD and Linux for many years.
>> Nope, it doesn't work for ARM yet (for GPIO interrupts for example) 
>> and Linux uses EPROBE_DEFER mechanism for a long time.
>> See: http://lxr.free-electrons.com/source/drivers/base/platform.c#L87
>
> There is some missing code on ARM (probably about 30 minutes of work
> to make it match PowerPC) to make it work in an ideal case, sure, but
> there is no reason you could not go out right now, with the existing
> code, and implement GPIO interrupts by declaring the GPIO driver as an
> interrupt controller.
>
> Can you give any concrete case of something that doesn't work?
GPIO again. How you can allocate interrupt associated with given gpio
pin installed by "cd-gpios = <&gpio TEGRA_GPIO(V, 2) GPIO_ACTIVE_LOW>;"

>>
>>>
>>>>
>>>>> (1) is easy to fix without API changes, but (2) and (3) are
>>>>> fundamental architectural problems that will bite us immediately
>>>>> down the road and cause a permanent schism between OF support on
>>>>> different platforms.
>>>>>
>>>>> Let me describe how this is handled on PowerPC (Linux on PPC
>>>>> solves the problem the same way). When constructing a resource
>>>>> list, bus drivers that construct them from OF properties call
>>>>> ofw_bus_map_intr() with the interrupt parent phandle and the array
>>>>> of cells corresponding to the interrupt. This thunks immediately
>>>>> to nexus, which connects to code in intr_machdep.c. Code there
>>>>> assigns a unique made-up virtual IRQ and returns it, caching the
>>>>> interrupt parent ID and opaque interrupt data (if the same string
>>>>> of data reappears later, you get back the same virtual IRQ of
>>>>> course).
>>>>>
>>>>> When PIC drivers attach and register themselves with the interrupt
>>>>> handling layer, all the interrupts for that PIC are passed to it
>>>>> along with the virtual IRQ. The PIC driver is supposed to know
>>>>> what its interrupt data mean, which can be safely guaranteed, and
>>>>> it presents the assigned virtual IRQ number to the kernel when
>>>>> dispatching interrupts. (IRQs configured after PIC attachment are
>>>>> passed through immediately).
>>>>>
>>>>> This accomplishes the following things:
>>>>> 1. Parsing interrupt data is moved to the PIC driver, which is the
>>>>> only place it can be done safely.
>>>> I don't see anything different comparing with INTRNG.
>>>
>>> What I am advocating *is* INTRNG, at least as originally conceived
>>> and implemented.
>>>
>>>>> 2. There is no ordering requirement on PIC attachment vs. the
>>>>> attachment of anything else.
>>>> I think thats is not a true  - PIC must exist before
>>>> bus_alloc_resource() / bus_setup_intr() is called.
>>>
>>> It does not with the IRQ mapping infrastructure. Interrupts are set
>>> up at PIC attachment, whenever that occurs.
>>>
>> Assuming that bus_alloc_resource and bus_setup_intr() are close
>> thorougher and in linear piece of code, can i assume that you can
>> call bus_setup_intr()
>> without PIC attached ?
>
> Yes.
So driver can request and/or setup any random IRQ and  gets success from
bus_alloc_resource() or bus_setup_intr()?
Do you think that is this right behavior?
>
>>>>
>>>>> 3. Changes are extremely minimal relative to the "standard"
>>>>> interrupt flow: you only have to patch code that is already
>>>>> directly dealing with OF interrupts.
>>>> I don't see anything different comparing with INTRNG.
>>>
>>> Again, this was the original INTRNG architecture and is already
>>> implemented. As such, there are *no* changes required on ARM to get
>>> it. bus_map_intr() adds a bunch of new code, in parallel with the
>>> old code that also solves the problem, to no purpose.
>> So, on PPC, how i can get interrupt for GPIO pin described by this
>> property:
>> https://svnweb.freebsd.org/base/head/sys/gnu/dts/arm/tegra124-jetson-tk1.dts?revision=295436&view=markup#l1691
>>
>
> The GPIO controller registers itself as an interrupt domain and
> decodes those strings as IRQ specifiers. When interrupts are
> configured, it does whatever it needs to do to configure them
> appropriately and dispatches them to the kernel when they occur. It's
> a pretty trivial cascaded interrupt configuration. And, since the VIRQ
> code does not need the interrupt controller attached in advance, you
> don't need to worry about attach order of wherever &gpio points and
> the SDHCI driver.
>
>>>
>>>>> 4. It happens at bus enumeration time, when results can be
>>>>> guaranteed self-consistent.
>>>> Where do you see any potential source of inconsistency in INTRNG?
>>>
>>> See the example above about modified interrupt lists. There is also
>>> no obvious way for a child device to construct an interrupt not
>>> assigned to it by the parent device from device tree properties
>>> without knowing in some detail what kind of interrupt needs to be built.
>>>
>>>>
>>>>> 5. It combines naturally with ofw_bus_lookup_imap() and friends in
>>>>> the interrupt-map case (e.g. for PCI).
>>>> Again, I don't see anything different. Proper parsing of interrupt
>>>> property is not a problem of INTRNG (but must be fixed, of course).
>>>
>>> But it is *already* fixed by the standard code that already exists.
>>> You are introducing a less-functional parallel code path here.
>> NO, its not fixed, at least not for ARM.
>
> Why not, concretely? I'm happy to write whatever code is missing if
> there's a bug. It can't be more than a few tens of lines in arm/intr.c.
Interrupt maps are not covered by current ARM code.

>>>
>>>>>
>>>>> I'm not sure what the right path forward is, but this code needs
>>>>> to be fixed. The PowerPC code is fully MI, and was the template
>>>>> for the original INTRNG, so it shouldn't be too bad to replace.
>>>>> -Nathan
>>>>>
>>>>
>>>> So, new INTRNG:
>>>> - Introduces new more general bus method that can parse interrupt
>>>> configuration
>>>>  data from any source. Is this step backward?
>>>
>>> Yes, since it is more general in some sense, while simultaneously
>>> handling fewer cases than code that already exists and is implemented.
>>>
>>>>
>>>> - Old INTRNG and PPC code stores unparsed and/or parsed interrupt
>>>> data in
>>>>   INTRNG and each consumer must query for them. This data sharing
>>>> also causes
>>>>   significant locking issues.  New INTRNG stores interrupt
>>>> configuration data into
>>>>   given resource, so each relevant bus method can access it
>>>> immediately.
>>>>   Is this step backward?
>>>
>>> Which locking issues? And yes, it is.
>>>
>>>>
>>>> - New INTRNG is not OFW centric, it can works with virtually
>>>> unlimited number
>>>>    of configuration data sources.  Is this step backward?
>>>
>>> Also yes, because it makes the interrupt handles less opaque, which
>>> makes the infrastructure less flexible.
>>>
>>>> - New INTRNG correctly uses standard system infrastructure. Real
>>>> IRQ number
>>>>    is reserved in rman within bus_alloc_resource() call, interrupt
>>>> HW is
>>>>    configured (only!) within bus_setup_intr()  call. Is this step
>>>> backward?
>>>
>>> The "real" IRQ number is not well defined always, so requiring that
>>> is a step backwards, yes.
>>>
>>>> - New INTRNG completely eliminates huge and not always working virtual
>>>>   IRQ concept.
>>>
>>> When does it "not always work"? It seems to, in fact, always work on
>>> multiple platforms and have for a long time in the face of all kinds
>>> of totally bizarre topologies and system architectures.
>>>
>>>>
>>>>
>>>> Don’t take me bad, I’m open to any change.  But no, at this time,
>>>> I’m not ready to completely revert someone else's work – although I
>>>> am a co-author.
>>>
>>> I would urge, in the strongest possible terms, that this be backed
>>> out from stable/11 at least. We can add the new API back for 11.1 if
>>> we want it, but we totally lose the ability to change it later in
>>> the stable/11 cycle if it stays in now.
>>> -Nathan
>>>
>> The API is part of still unstable, experimental INTRNG, so its not
>> fixed we we can change it at any time, I think.
>> But yes, we forget to wrap new bus_map_intr()  method (and associated
>> code) by #ifdef INTRNG. Is this sufficient for you?
>
> For HEAD, yes. I would like it out of stable/11 entirely until this
> discussion converges.
> -Nathan
The current code  (in stable/11) works and is tested. I simple cannot
commit any untested change for stable tree mainly if is in BETA2 stage.
And I cannot fully test the requested change, at this time i have access
to single ARM platform.
But we're in the same situation - both have the same commit bit, neither
one of us is the author of the disputed commit and neither of us are not
able to fully test it.
So feel free to commit what you want, if you have courage to commit
untested code.  I haven't it...\
Michal




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?57934ABD.6010807>