From owner-freebsd-arch@freebsd.org Sun Jul 24 22:29:42 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6D863BA30AB; Sun, 24 Jul 2016 22:29:42 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from d.mail.sonic.net (d.mail.sonic.net [64.142.111.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5B8411C9D; Sun, 24 Jul 2016 22:29:42 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from zeppelin.tachypleus.net (75-101-50-44.static.sonic.net [75.101.50.44]) (authenticated bits=0) by d.mail.sonic.net (8.15.1/8.15.1) with ESMTPSA id u6OMTdKZ021282 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sun, 24 Jul 2016 15:29:39 -0700 To: freebsd-arch , freebsd-arm@freebsd.org, Michal Meloun , Svatopluk Kraus From: Nathan Whitehorn Subject: bus_map_intr() changes Message-ID: Date: Sun, 24 Jul 2016 15:29:38 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 X-Sonic-CAuth: UmFuZG9tSVYpR/a0cWS68gL1SqUzzrY3bq00D+HhJ3yL9Mq/FCQoWRnSA7uRKVpReCrqnhptsLMbquuh7pe5qLO8KbrZ96y06DKL7GPCMhk= X-Sonic-ID: C;JHm6G+5R5hGIN5NwxPCmMQ== M;uDX2G+5R5hGIN5NwxPCmMQ== X-Spam-Flag: No X-Sonic-Spam-Details: 0.0/5.0 by cerberusd Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jul 2016 22:29:42 -0000 There is a very long thread on the SVN list about this that has lasted much too long and should be over here. So, I'd like to start with a clean slate. The discussion is related to r301453, which adds a function BUS_MAP_INTR() and a function bus_extend_resource() that allow a parent bus to decode SYS_RES_IRQ-type resources during bus_alloc_resource(). This is designed to allow the parent bus to read in flags from the device tree regarding IRQ setup (polarity, trigger mode) and pass that along to the interrupt controller with the resource allocation request. By specifying an enum intr_map_data_type, you can also specify what kind of interrupt specifier is being added to the decoration (ACPI, FDT, GPIO). This is a departure from the existing code used on device tree systems (OFW_BUS_MAP_INTR()), which allocates a virtual IRQ corresponding to an interrupt-parent key and one of the device tree's opaque arbitrary-length interrupt specifiers at resource assignment time. The information that virtual IRQ maps to is cached and then applied by the PIC when the PIC's interrupts are configured, which may be after BUS_SETUP_INTR() if the call is made before interrupts are on and the PIC hasn't attached at that point. I am making a wiki page at https://wiki.freebsd.org/Complicated_Interrupts to describe the current implementation and the rationale for its implementation, which should exist by the time most read this. Some of that content is copied below for ease of replying. My concern is that the new API (r301453) parallels the existing one in a way that will require both to be maintained indefinitely, while providing less functionality, in particular in three ways: 1. Breaking the opacity of the device tree's interrupt specifiers (which only the PIC driver knows what to do with) 2. Requiring the bus parent to know exactly how to map an IRQ number (poorly determined, as above) to a device tree entry (which may not include it -- see below) and know how to interpret it (above, which only the PIC driver knows) 3. It also requires that the PIC driver already be attached, which cannot be guaranteed on some systems where the PIC is a bus child of devices with interrupt on that same PIC. What I would like to establish here, rather than just being cranky, is that this new API both (a) does something on real hardware that the existing API cannot do, either currently or with trivial modifications and (b) is capable of expressing the things the current API can express. If the answer to either of those is "no", we're going to have to support both in perpetuity, with different paths on different platforms and the whole thing is going to be a huge mess. We're in a situation right now where that will be baked into FreeBSD 11 for the duration of the branch, which is quite unfortunate. -Nathan ---- Excerpt of wiki text ---- ---- Part 1: Overview of mechanism ---- The core part of this system is a registry in machine-dependent code that maps some description of an interrupt to an IRQ number used by the rest of the kernel. This number is arbitrary; on systems in which a useful human-readable number can be extracted in a general way from the description, it is helpful for users for the IRQ number to be related to something about the system (e.g. the interrupt pin on single-controller systems) but it can be just a monotonically increasing integer. Currently one interrupt mapping strategy is implemented: Open Firmware (or FDT) interrupt-parent / interrupt specifier tuples to IRQ. Bus code maps the Open Firmware interrupt specifier using the ofw_bus_map_intr() function, which is cascaded through the bus hierarchy and is usually resolved by nexus. int ofw_bus_map_intr(device_t dev, phandle_t iparent, int icells, pcell_t *intr); This takes the requesting device, the xref phandle of the interrupt parent (e.g. from the "interrupt-parent" property, or the equivalent entry in an interrupt-map) and the byte string describing the interrupt (e.g. the contents of the "interrupts" property, or the equivalent entry in an interrupt-map) and returns a unique IRQ number that can be added to a resource list and used with bus_alloc_resource(), bus_setup_intr(), etc. In the event you needed more than OF-type mappings (e.g. for ACPI) you could add an equivalent acpi_bus_map_intr() method to nexus that tabulates mappings in parallel based on different data. ---- Part 2: Rationale ---- As a separate issue, it would be great if you could comment on a way to implement the following two scenarios with this API, which I think are currently impossible and would need to solved to avoid bifurcation. These kinds of things are what drove the current API. --- Case 1: the G5 Powermac --- I have hardware with two PICs, one cascaded from the other. PIC 1 lives in the northbridge, and PIC 2 lives on a device on the PCI bus behind a couple of bridges. Depending on the era of the hardware, these are cascaded in different directions. They have different interrupt specifier formats. A. How do I represent interrupts on the PCI bus parent of PIC 2 that are handled by PIC 2? PIC 2 obviously can't attach before its bus parent, but the bus parent can't complete initialization without the ability to setup its interrupts. B. Devices on the PCI bus have interrupts handled by a mixture of PIC 1 and PIC 2, sometimes on the same device and not always expressable through the bus hierarchy. For example, one of the two storage controllers has an interrupt on PIC 2 run through a wire that doesn't go through the PCI connection and so isn't in the interrupt-map of the PCI bus, which is wired (mostly) to PIC 1, and about which the parent bus can and should know nothing. --- Case 2: IBM OPAL firmware --- The /ibm,opal device on IBM PowerNV systems has a non-standard interrupts property ("opal-interrupts") that contains the list of IRQs that should be forwarded to the firmware. These are not interrupts belonging to a single physical device at /ibm,opal (which is a virtual device anyway) and so are not in the interrupts property; nor do they necessarily share an interrupt parent. How do I represent this? --- Case 3: IBM XICS interrupts --- On virtualized (and most non-virtualized) IBM hardware, the interrupts are one cell and that single cell encodes the interrupt parent, the line sense, and the IRQ. --- Case 4: MSIs --- MSIs are assigned purely by the PCI bus and the PCI bus parent can't know about them from the device tree. How does the bus parent sensibly decorate resources like this? The PCI MSI API assumes that these all exist purely as 32-bit integers and they are not assigned through resource lists in the conventional fashion.