Date: Fri, 5 Sep 2014 21:54:03 -0700 From: John-Mark Gurney <jmg@funkthat.com> To: Ian Lepore <ian@FreeBSD.org> Cc: "freebsd-arm@freebsd.org" <freebsd-arm@FreeBSD.org>, ticso@cicely.de Subject: Re: Cubieboard: Spurious interrupt detected Message-ID: <20140906045403.GU82175@funkthat.com> In-Reply-To: <1409967197.1150.339.camel@revolution.hippie.lan> References: <2279481.3MX4OEDuCl@quad> <20140905215702.GL3196@cicely7.cicely.de> <1409958716.1150.321.camel@revolution.hippie.lan> <CAJ-Vmo=EJVFqNnMo_dzevGvFWLSR6LVfYbYmOot1bLZbCvVMTQ@mail.gmail.com> <20140906011526.GT82175@funkthat.com> <1409967197.1150.339.camel@revolution.hippie.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
Ian Lepore wrote this message on Fri, Sep 05, 2014 at 19:33 -0600: > On Fri, 2014-09-05 at 18:15 -0700, John-Mark Gurney wrote: > > Adrian Chadd wrote this message on Fri, Sep 05, 2014 at 17:44 -0700: > > > On 5 September 2014 16:11, Ian Lepore <ian@freebsd.org> wrote: > > > > On Fri, 2014-09-05 at 23:57 +0200, Bernd Walter wrote: > > > >> On Sat, Sep 06, 2014 at 01:43:23AM +0400, Maxim V FIlimonov wrote: > > > >> > And another problem: every now and then the kernel says something like that: > > > >> > Sep 5 19:22:37 kernel: Spurious interrupt detected > > > >> > Sep 5 19:22:37 kernel: Spurious interrupt detected > > > >> > Sep 5 19:23:46 last message repeated 10 times > > > >> > > > > >> > I've heard that FreeBSD happens to do that on ARM devices. What could be the > > > >> > problem here? > > > >> > > > >> Means something generates inetrrupts, which are not handled by a driver. > > > >> Could be the cause for your load problem too. > > > >> > > > > > > > > No, that would be stray interrupts. Spurious interrupts happen when an > > > > interrupt is asserted, but by time the processor asks the interrupt > > > > controller for the current active interrupt, it is no longer active. > > > > > > > > One way it can happen is when an interrupt handler writes to a device to > > > > clear a pending interrupt and that write takes a long time to complete > > > > because the device is on a slow bus, and the interrupt controller is on > > > > a faster bus. The EOI to the controller outraces the device write that > > > > would clear the pending interrupt condition, so the processor is > > > > re-interrupted, but by time it asks for the next active interrupt the > > > > device write has finally completed and the interrupt is no longer > > > > pending. > > > > > > > > That sequence used to happen a lot, and it was "fixed" by adding an > > > > l2cache sync (basically a "drain write buffer") just before an EOI. You > > > > sometimes still see an occasional spurious interrupt, but it shouldn't > > > > be happening multiple times per second as seen in the logging above. > > > > > > Hm, interesting. I remember your discussion about it on IRC. The > > > atheros code ends up working around this in the driver by doing a read > > > from the ISR after writing out bits to clear things, so the clear is > > > flushed out. > > > > > > I wonder if we should be asking all device drivers to be doing their > > > own ISR flushing before returning from their interrupt handlers. > > > > This is required on PCI (that you do a read to clear the posted/pending > > write)... So, IMO, yes, all device drivers should do the proper > > clearing of their writes to the ISR... > > > > But a driver can't assume that a read is sufficient on all architectures > it may run on. bus_space_barrier() is the right way. Also, it's not Except that I don't think even on PCI a bus_space_barrier is sufficient... I was just looking at i386's implementation of bus_space_barrier and it just does a stack access... This won't be sufficient to clear any PCI bridges that may have the write still pending... There's also the issue that if __GNUCLIKE_ASM is not defined, the code will compile w/o ANY barrier, not even a compiler_membar... We should probably add a #else #error please add your compilers equivalent... > just that a barrier is needed before exiting an isr... if the isr uses > locking to synchronize with hardware access by the non-isr part of the > driver, then the bus space barriers are needed in conjunction with the > locking, so that, for example, the isr's usage of the hardware is truly > complete before a lock is released. > > Scattered amongst 10 of the roughly 240 drivers in sys/dev there are 42 > calls to bus_space_barrier(). Getting all the drivers fixed will be a > big job. That's why I was thinking along the lines of an > architecture-wide workaround with potentially a way to mark a driver as > not needing the workaround once we get the fixing underway. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140906045403.GU82175>