Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 3 Oct 1996 14:37:08 +1000
From:      Bruce Evans <bde@zeta.org.au>
To:        bde@zeta.org.au, Tor.Egge@idt.ntnu.no
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Interrupt lossage in FreeBSD-current.
Message-ID:  <199610030437.OAA32243@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help
>> Perhaps it should run at spl < splsoftclock.  (Loss of ordinary clock
>> interrupts is worse than loss of RTC interrupts, so perhaps it should
>...
>If you lose one RTC interrupt, you lose all RTC interrupts
>thereafter, since the interrupt handler must enable further
>interrupts.

That's if you miss calling the RTC interrupt handler.  AFAIK, the RTC
IRQ line always stays high until the IRQ is serviced.  This is what
happens here.  Last night, one of my systems was sitting at the debugger
prompt with interrupts disabled, and RTC interrupts still worked when
I left the debugger.  I have a counter in the RTC interrupt handler so
I can be sure that it wasn't called.

>If you lose one clock interrupt, the clock is one tick wrong. This error
>can later be corrected by use of xntpd or timed.

You'll lose 30 clock interrupts for disabling interrupts for 0.3 seconds.
I lost 3600000 interrupts for disabling interrupts for 10 hours :-).
For debugging should be recovered by reading the RTC or by polling the
8254 in the debugger i/o routines.

>> It would be more correct to use `ipending' instead of `imen'.  `ipending'
>> gives pending interrupts that the system already knows about.  `imen' is
>>...
>The bit flag in ipending is only set when the interrupt is blocked by
>cpl, and it is always cleared before the interrupt handler is called.
>Thus ipending is not usable with regards to hardware interrupts.

Neither is very usable.  If ipending is clear but imen is set, then the
interrupt handler must be active.  It may be about to exit, in which case
you want to restart it, or it may be in a loop, in which case you don't
want to do anything, since it may handle the interrupt and then be confused
by being called again.  However, it may be that the IRR bit in the ICU can
never be set while the handler is active (because the ICU mask bit stops
it from working).

>But something was causing an RTC interrupt to be lost. I've only
>experienced it while profiling a program (while the RTC interrupt rate
>is 1024 Hz), thus I can only assume that the RTC does not like a
>latency longer than the interval between two RTC interrupts.

Please check this.  Disable interrupts for a second or two and see if
the RTC stops.

>> Do you really need to use a fastintr handler?  The fastness of a fastintr
>>...
>No. I don't need to use a fastintr handler, and I've now reverted to
>using slow interrupts (maximum rate: 61500 interrupts/s, i.e. 16 us/interrupt.

OK.

>> >I cannot immediately see any reasons not to reenable the ICUs before
>> >calling the interrupt handler from the fast interrupt vector code in
>> >...
>> Yes, this makes no difference.  Also, the ICUs get reenabled immediately
>> if the AUTO_EOI_* options are used.  The problems start with temporarily
>> ...

>That depends on the device in question. Nesting
>should be no problem as long as interrupts are disabled again
>before telling the device that it can generate further interrupts.

There might be some minor problems.  intr_nesting_level is not adjusted
for fastintr handlers.  If the device needs to be masked in the ICU,
it would be better to cooperate with the usually masking.  This could
be implemented easily as a notsofastintr handler - same as a normal
intr handler except it doesn't enable interrupts before calling the
handler.

>I have now reverted to using slow interrupts. What I do in addition is:
>  
>   1. loop through intr_mptr[], blocking the interrupt for the device
>      during any hardware interrupt.
>   2. loop through the imasks array, blocking the interrupt for the
>      device during any software interrupt.

OK.  Perhaps there should be a special device class for this.  The
interrupt mask low_imask would be OR'ed into all the other masks.

>   3. In the interrupt handler for the device, check ipending for
>      a pending SWI_CLOCK, and if any so, perform the restart of the
>      device in the timeout routine instead of in the interupt
>      handler.

There should be a macro for this.  (The setxxx() and schedxxx() macros
in spl.h are supposed to hide implementation details.)

>   4. In hardclock(), softclock() is no longer called directly, 
>      since splsoftclock() does not block the device. 

It is already never called directly in FreeBSD (see <machine/cpu.h>).
There are nesting problems, e.g., in hardclock(), the clock bit is set
in the ICU, so clock interrupts would be masked in softclock() if it
were called directly (unless you fix unset the bit in the ICU and fix
the problems that this would cause...); softclock process may take a
long time, so some future hardclock calls may (rarely) be missed and
some future clock interrupts at 16KHz for pcaudio will usually be missed.

>1. is done to avoid starvation of other hardware interrupt handlers.

>2., 3. and 4. is done to avoid starvation of the timeout() handling. (e.g.
>avoid ncr dead? messages).

Someone should fix the ncr driver.  I deleted the (np->latetime>4)
section so that it is doesn't get confused by ddb masking some
incr interrupts but not clock interrupts.  Its error handling for
the non-error screws up the SCSI bus.

I wouldn't worry about fixing timeout handling.  It is normal on slow
machines to miss a couple of timeouts.  Device drivers should be prepared
for this.  Maybe not for 0.3 seconds - 10 hours though.  Some devices
may not be prepared for that.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199610030437.OAA32243>