Date: Thu, 3 Oct 1996 14:37:08 +1000 From: Bruce Evans <bde@zeta.org.au> To: bde@zeta.org.au, Tor.Egge@idt.ntnu.no Cc: freebsd-hackers@freebsd.org Subject: Re: Interrupt lossage in FreeBSD-current. Message-ID: <199610030437.OAA32243@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>> Perhaps it should run at spl < splsoftclock. (Loss of ordinary clock >> interrupts is worse than loss of RTC interrupts, so perhaps it should >... >If you lose one RTC interrupt, you lose all RTC interrupts >thereafter, since the interrupt handler must enable further >interrupts. That's if you miss calling the RTC interrupt handler. AFAIK, the RTC IRQ line always stays high until the IRQ is serviced. This is what happens here. Last night, one of my systems was sitting at the debugger prompt with interrupts disabled, and RTC interrupts still worked when I left the debugger. I have a counter in the RTC interrupt handler so I can be sure that it wasn't called. >If you lose one clock interrupt, the clock is one tick wrong. This error >can later be corrected by use of xntpd or timed. You'll lose 30 clock interrupts for disabling interrupts for 0.3 seconds. I lost 3600000 interrupts for disabling interrupts for 10 hours :-). For debugging should be recovered by reading the RTC or by polling the 8254 in the debugger i/o routines. >> It would be more correct to use `ipending' instead of `imen'. `ipending' >> gives pending interrupts that the system already knows about. `imen' is >>... >The bit flag in ipending is only set when the interrupt is blocked by >cpl, and it is always cleared before the interrupt handler is called. >Thus ipending is not usable with regards to hardware interrupts. Neither is very usable. If ipending is clear but imen is set, then the interrupt handler must be active. It may be about to exit, in which case you want to restart it, or it may be in a loop, in which case you don't want to do anything, since it may handle the interrupt and then be confused by being called again. However, it may be that the IRR bit in the ICU can never be set while the handler is active (because the ICU mask bit stops it from working). >But something was causing an RTC interrupt to be lost. I've only >experienced it while profiling a program (while the RTC interrupt rate >is 1024 Hz), thus I can only assume that the RTC does not like a >latency longer than the interval between two RTC interrupts. Please check this. Disable interrupts for a second or two and see if the RTC stops. >> Do you really need to use a fastintr handler? The fastness of a fastintr >>... >No. I don't need to use a fastintr handler, and I've now reverted to >using slow interrupts (maximum rate: 61500 interrupts/s, i.e. 16 us/interrupt. OK. >> >I cannot immediately see any reasons not to reenable the ICUs before >> >calling the interrupt handler from the fast interrupt vector code in >> >... >> Yes, this makes no difference. Also, the ICUs get reenabled immediately >> if the AUTO_EOI_* options are used. The problems start with temporarily >> ... >That depends on the device in question. Nesting >should be no problem as long as interrupts are disabled again >before telling the device that it can generate further interrupts. There might be some minor problems. intr_nesting_level is not adjusted for fastintr handlers. If the device needs to be masked in the ICU, it would be better to cooperate with the usually masking. This could be implemented easily as a notsofastintr handler - same as a normal intr handler except it doesn't enable interrupts before calling the handler. >I have now reverted to using slow interrupts. What I do in addition is: > > 1. loop through intr_mptr[], blocking the interrupt for the device > during any hardware interrupt. > 2. loop through the imasks array, blocking the interrupt for the > device during any software interrupt. OK. Perhaps there should be a special device class for this. The interrupt mask low_imask would be OR'ed into all the other masks. > 3. In the interrupt handler for the device, check ipending for > a pending SWI_CLOCK, and if any so, perform the restart of the > device in the timeout routine instead of in the interupt > handler. There should be a macro for this. (The setxxx() and schedxxx() macros in spl.h are supposed to hide implementation details.) > 4. In hardclock(), softclock() is no longer called directly, > since splsoftclock() does not block the device. It is already never called directly in FreeBSD (see <machine/cpu.h>). There are nesting problems, e.g., in hardclock(), the clock bit is set in the ICU, so clock interrupts would be masked in softclock() if it were called directly (unless you fix unset the bit in the ICU and fix the problems that this would cause...); softclock process may take a long time, so some future hardclock calls may (rarely) be missed and some future clock interrupts at 16KHz for pcaudio will usually be missed. >1. is done to avoid starvation of other hardware interrupt handlers. >2., 3. and 4. is done to avoid starvation of the timeout() handling. (e.g. >avoid ncr dead? messages). Someone should fix the ncr driver. I deleted the (np->latetime>4) section so that it is doesn't get confused by ddb masking some incr interrupts but not clock interrupts. Its error handling for the non-error screws up the SCSI bus. I wouldn't worry about fixing timeout handling. It is normal on slow machines to miss a couple of timeouts. Device drivers should be prepared for this. Maybe not for 0.3 seconds - 10 hours though. Some devices may not be prepared for that. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199610030437.OAA32243>