Date: Fri, 18 Jan 2013 11:56:27 -0500 From: Ryan Stone <rysto32@gmail.com> To: FreeBSD Current <freebsd-current@freebsd.org> Subject: ULE can leak TDQ_LOCK() if statclock() called outside of critical_enter() Message-ID: <CAFMmRNz7AQMRr2%2B-59702k_mr=3=DsFP9c7ejPEBC7d=4w0bTA@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
I have been experiencing occasional deadlocks on FreeBSD 8.2 systems using the ULE scheduler. The root cause in every case has been that ULE's TDQ_LOCK for cpu 0 is owned by a thread that is not running. I have been investigating the issue, and I believe that I see the issue. The problem occurs if the interrupt that drives statclock does not call critical_enter() upon calling into statclock(). The lapic timer does use critical_enter(), so default configurations would not see this. I have local patches to use the RTC to drive statclock, and from a quick reading of the eventtimer code in -CURRENT the same thing is possible there. The RTC code does not call statclock within a critical section. So here's the bug: 1) A thread with interrupts enabled, running on CPU 0, with td_owepreempt=1 and td_critnest=1 calls critical_exit(): void critical_exit(void) { // ... if (td->td_critnest == 1) { td->td_critnest = 0; if (td->td_owepreempt && !kdb_active) { // Irrelevant bits snipped 2) td_critnest is set to 0, and then the RTC interrupt fires. 3) rtcintr calls into statclock (8.2) or statclock_cnt(head) with td_critnest still 0 (on head it goes through the eventtimer code, but it ends up in statclock eventually). 4) statclock takes the thread_lock on curthread, then calls sched_clock(). sched_clock calls sched_balance(); static void sched_balance(void) { // snip... tdq = TDQ_SELF(); TDQ_UNLOCK(tdq); sched_balance_group(cpu_top); TDQ_LOCK(tdq); } TDQ_UNLOCK does a spinlock_exit which does a critical_exit. td_critnest will be decremented back to 0 and td_owepreempt is still 1, so this triggers a preemption. Note that this TDQ_UNLOCK is (intentionally) unlocking the thread_lock done by statclock. 5) thread migrates to any other CPU, call it CPU n. tdq is now stale. TDQ_LOCK takes the lock for CPU 0 (but really it's intending to re-take the thread_lock, although a thread_lock() here would be equally incorrect -- sched_balance's caller is going to be mucking around with the TDQ when sched_balance returns). 6) The thread returns to statclock. statclock does a thread_unlock(). The td_lock is TDQ_LOCK(n), which we don't hold. We mangle the stat of TDQ_LOCK(n) and leave TDQ_LOCK(0) held. The simplest solution would be to do a critical_enter() in sched_balance, although that would be superfluous in the normal case where the lapic timer is driving statclock. I'm not sure if there's other code in the eventtimers infrastructure that's assuming that a preemption or migration is impossible while handling an event. A quick look at kern_clocksource.c turns up worrying comments like "Handle all events for specified time on this CPU" and uses of curcpu, so there may well be other issues lurking here. It looks to me that the safest thing to do would be to push the critical_enter() into the eventtimer code or even all the way back to the interrupt handlers (mirroring what the lapic code already does).
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFMmRNz7AQMRr2%2B-59702k_mr=3=DsFP9c7ejPEBC7d=4w0bTA>