Date: Tue, 19 Oct 2010 13:52:29 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Alexander Motin <mav@freebsd.org> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>, Marius Strobl <marius@alchemy.franken.de> Subject: Re: svn commit: r213985 - head/sys/sparc64/sparc64 Message-ID: <20101019122827.T1195@besplex.bde.org> In-Reply-To: <4CBCBF25.8010902@FreeBSD.org> References: <201010171646.o9HGks2U038501@svn.freebsd.org> <4CBCADDD.5070109@FreeBSD.org> <20101018205224.GO1416@alchemy.franken.de> <201010181705.24879.jhb@freebsd.org> <20101018213055.GP1416@alchemy.franken.de> <4CBCBF25.8010902@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 19 Oct 2010, Alexander Motin wrote: > Marius Strobl wrote: >> On Mon, Oct 18, 2010 at 05:05:24PM -0400, John Baldwin wrote: >>> On Monday, October 18, 2010 4:52:24 pm Marius Strobl wrote: >>>> AFAICT this is not true; intr_event_handle() in sys/kern/kern_intr.c >>>> is what enters a critical section and f.e. on amd64 I don't see where >>>> anywhere in the path from ISR_VEC() to intr_execute_handlers() >>>> calling intr_event_handle() a critical section would be entered, >>>> which also means that in intr_execute_handlers() td_intr_nesting_level >>>> is incremented outside of a critical section. >>> Not all of the clock interrupts use intr_event_handle(). The local APIC >>> timer uses its own interrupt entry point on x86 for example and uses an >>> explicit critical section as a result. I suspect the sparc64 tick interrupt >>> is closer to the local APIC timer case and doesn't use intr_event_handle(). >> >> Correct; but still you can't say that the MD interrupt code enters a >> critical section in general, neither is incrementing td_intr_nesting_level >> in intr_execute_handlers() protected by a critical section. >> >>> The fact that some clock interrupts do use intr_event_handle() (e.g. the >>> atrtc driver on x86 now) does indicate that the low-level interrupt code >>> probably does not belong in the time events code but in the caller. >> >> Well, I agree that entering a critical section in the time events >> code would mean entering a nested critical section unnecessarily in >> case the clock driver uses a regular "fast" interrupt handler and >> that should be avoided. Still I don't think the event time front-end >> actually should need to worry about wrapping the callback in a >> critical section. I think it belongs in the caller, and the front end shouldn't do any more than assert it. > Interrupt frame, required for hard-/stat-/profclock() operation is > stored in curthread. So critical section is effectively mandatory there > now. Correct td_intr_nesting_level value is also important for proper > interrupt threads scheduling - one more reason to have critical section > there. It is indeed strange that td_intr_nesting_level in > intr_event_handle() is not covered by critical section, but probably it > should. td_intr_nesting_level is garbage, as is the style of the use of it in sched_ule.c (non-boolean used in boolean context). I introduced it (as a global named intr_nesting_level) to count interrupt nesting with spls. It was an i386 implemention detail for limiting interrupt nesting to about 4, but a few places abused it to detect things like M_WAITOK malloc()s in interrupt context. There was 1 non-abusive MI reference to it, via the MD CLKF_INTR(framep) macro, and the most of the abuses wouldn't have been abuses if there had been a MD macro for them. See RELENG_4 for all this. (spls allow interrupt nesting to either the number of bits in the mask or the number of high-level spls, but the kernel stack is not large enough for this, so there was an even more arbitary limit of about 4.) Now with ithreads for interrupts, interrupts can't nest (an interrupt can interrupt an ithread, but that is no more nesting than an interrupt interrupting a user thread). td_intr_nesting_level is 0 in ithreads, so it is almost useless for its original abuse for detecting things like M_WAITOK malloc()s in interrupt context, but it is still abused for that :-(. td_intr_nesting level is != 0 mainly in low-level interrupt handling code and fast interrupt handlers, and neither of these should be so broken as to call malloc(), so the check in malloc() is useless; the correct check (for TDP_ITHREAD) is missing. Similar checks in uipc_mbuf.c were just removed, and similar but more-bogus checks in vinum went away with vinum. This leaves 3 possibly-useful uses of td_intr_nesting_level in -current: - {amd64,i386}/trap.c: td_intr_nesting_level is checked to be zero to makes certain traps more fatal when it is nonzero. Other arches don't bother with such a test. amd64 and i386 don't bother with testing the more likely error of a trap that should be fatal in an ithread. This can be handled in the same way as in malloc() -- don't bother testing for the very unlikely case, especially if you don't bother testing for the unlikely case. - kern_clock.c: this tests for td_intr_nesting_level >= 2. That is almost exactly the test that used to be hidden in the MD CLKF_INTR(). I used to maintain patches with an assertion that this never happens with ithreads. It never happens unless there is a bug in the low-level interrupt handling code or in the management of td_intr_nesting_level. On entry to an interrupt, td_intr_nesting level is incremented to 1, and another thread should be switched to without allowing any interrupt that will increment it again; then on switch back, interrupts should remain masked so that it is not incremented from 1 to 2 just before it is decremented to 0. I would have removed td_intr_nesting_level from my version, except my version actually needs it, for this use in hardclock() only :-(. In my version, the low-level interrupt masking is partially in software, so clock interrupts can interrupt the low-level interrupt handling. I just remembered that sparce64 also has or had some interrupt masking in software; perhaps it needs a similar treatment. - sched_ule.c: releatively recently, this scheduler started using td_intr_nesting_level. More recently td_intr_nesting_level was incremented in clock code to keep this scheduler happy, so something is clearly needed. But sched_4bsd.c doesn't need it, and the increment being outside of the critical section in intr_execute_handlers() shows how uncritical this variable is. Perhaps sched_ule.c can use td_critnest instead. You have to be carefully with not calling critical_enter() too much, so that the scheduler cannot tell the real level. Levels near mi_switch() have to have levels of precisely 0, 1 or 2, so that the level drops to precisely 0 when the switch is complete, and this is related to deciding whether to schedule. and 2 other nonsense abuses: - dev/sound/isa/sb16.c: abuses td_intr_nesting_level to avoid doing a diagnostic interrupt if (td_intr_nesting_level != 0). I think this has no effect, as for malloc() except it is easier to inspect the code to see that it cannot be reached in fast interrupt handler context, so the check is null. - vm_page.c: exactly the same as in malloc(), with a check for unlikely fast interrupt handler context but none for ordinary ithread context. Most arch-specific code that manages td_intr_nesting_level does it in a critical section. powerpc also uses atomic ops. I only grepped for td_intr_nesting_level, and would have missed any different spelling of it in asm code where most of the original i386 code for it was. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101019122827.T1195>