Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Feb 2004 14:50:19 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Julien Gabel <jpeg@thilelli.net>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Stray irq7.
Message-ID:  <20040225115747.O9312@gamplex.bde.org>
In-Reply-To: <53996.192.168.0.97.1077661722.squirrel@webmail.thilelli.net>
References:  <20040222185325.GA97979@cserv62.csub.edu>    <53996.192.168.0.97.1077661722.squirrel@webmail.thilelli.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 24 Feb 2004, Julien Gabel wrote:

> >> Getting this message at boot, with yesterday's CURRENT, after disk
> >> detection.
> >> stray irq7
> >> ...
> >> too many stray irq 7's: not logging anymore
>
> > This is most likely either a symptom of the brokenness of the
> > x86's ISA controller or you've disabled the parallel port driver.
> > If all the hardware you care about works, you can ignore this
> > message.

Er, you mean the correctness of the x86's ISA controller (it reports
problems if it detects them).  But this is out of date.  Stray irqs
are now all due to software bugs; glitches in hardware interrupts are
now mishandled as follows:
case 1: no ithread for irq7/15
    Then the glitches are detected and silently ignored.  Even counting
    of them is broken.  The detection is a relatively new feature in -current,
    but the mishandling became worse with the detection.  The correct
    handling is to broadcast interrupts for hardware glitches to all
    interrupt handlers (except ones like clkintr() that can't handle
    interrupts which are not for them).
case 2: ithread for irq7/15 with no handlers
    Then the glitches are not detected and are bogusly reported as stray
    interrupts.  But most such reports are probably due to software bugs
    causing normal interrupts.  The existence of this case is a software
    bug.  At least before the relative recent interrupt handling changes,
    the lpt driver caused normal interrupts that are reported as stray
    ones, as a result of the following 3 bugs:
       (a) lpt tears down and sets up its interrupt handler (in this order
	   IIRC) for every write.
       (b) lpt doesn't wait for previous interrupts to arrive before tearing
           down the handler.
       (c) Step (a) is potentially very costly, since it should cause
	   the ithread to go away if it has no other handlers, which
	   is the usual case for lpt (I think the ppbus level should
	   hang on to the ithread, but it apparently doesn't).  However,
	   the ithread stays around and its interrupt remains unmasked.
	   Sometimes an interrupt for (b) arrives in the window betwen
	   setup and teardown in (a).  Such interrupts are reported as
	   "stray".
case 3: ithread for irq7/15 with at least one handler
    Then the glitches are not detected and stray interrupts are sent to
    the handler(s).  The handlers should ignore them.  Most handlers have
    no problems ignoring interrupts that are not for them, since they
    need to do this anyway for shared interrupts.

This is for -current.  -stable is simpler and less buggy.

> Just 'for memory' there is a FAQ entry for that, but the question
> was already well answered :)
>
>   http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/\
>   troubleshoot.html#STRAY-IRQ

This was never quite right, and is very out of date for -current:

%    5.23. What does ``stray IRQ'' mean?
%
%    Stray IRQs are indications of hardware IRQ glitches, mostly from
%    hardware that removes its interrupt request in the middle of the
%    interrupt request acknowledge cycle.

Actually, in -stable they are indications of hardware IRQ glitches and
software bugs.  In -current, they mostly indicate software bugs (they
only indicate a hardware irq glitch if losing races like the one in
Case 2 coincide with a hardware irq glitch).

Also, until relatively recently in -current, there was a race setting
up interrupts (on i386's but not on alphas at least) which caused a
normal interrupt that was present at interrupt setup time to be at
least recorded as a stray one (IIRC, it was correctly sent to the
handler but misrecorded because the race was only in setting up the
interrupt name and counter).  Old ISA devices with tri-state line
drivers tend to always cause such an interrupts.  Thus there was almost
always a stray irq6.

%
%    One has three options for dealing with this:
%      * Live with the warnings. All except the first 5 per irq are
%        suppressed anyway.

This is still correct :-).

%      * Break the warnings by changing 5 to 0 in isa_strayintr() so that
%        all the warnings are suppressed.

In -current, there is no such function as isa_strayintr().  Until
relatively, it existed but was only used in the unusual case that there
is no ithread (previous version of Case 1).  I already knew too much
about this bug suite, but learned more investigating why isa_strayintr()
was almost never called :-).  Most reports of stray interrupts came from
sched_ithd().  The reports are now centralized in intr_execute_handlers():
Some other bugs were fixed and introduced by merging the reporting:
- "5" was spelled "MAX_STRAY_LOG" in sched_ithd().  That is now the only
  spelling.
- there were separate sets of counters for the 2 reporting routines.  You
  had to change "5" in both places.
- there were races incrementing the separate counters in the SMP case in
  sched_ithd().
- sched_ithd() used printf() but everything else uses log().  log() is
  better, but it is even less safe to call in (effectively) fast
  interrupt handler context than is printf().  -stable may have this
  bug in a different form -- "stray" interrupts may be missing interrupt
  masking.  The nmi handler has it since it _is_ missing interrupt
  masking.
- isa_strayintr() has better worded messages than sched_ithd().
  intr_execute_handlers() is in between.

%      * Break the warnings by installing parallel port hardware that uses
%        irq 7 and the PPP driver for it (this happens on most systems),
                       lpt? ppbus?
%        and install an ide drive or other hardware that uses irq 15 and a
%        suitable driver for it.

Using irq 15 for ata1 probably happens on most systems now.  Using lpt
to eat irq7's doesn't break the warning so well now, since lpt causes
the warning (perhaps since it was new-bused, or at least since current
was i-threaded).  Also, eating the warnings in lpt depends on its
(mis)implementation details.  ppbus wants to multiplex the irq between
different drivers.  I think it does this by leaving the irq attached and
switching it around as required (actually more than required), but it
should leave the irq unattached and attach it as required.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040225115747.O9312>