Date: Tue, 24 May 2011 13:36:42 +0000 (UTC) From: John Baldwin <jhb@FreeBSD.org> To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r222254 - head/sys/kern Message-ID: <201105241336.p4ODagSQ070533@svn.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: jhb Date: Tue May 24 13:36:41 2011 New Revision: 222254 URL: http://svn.freebsd.org/changeset/base/222254 Log: Fix an issue with critical sections and SMP rendezvous handlers. Specifically, a critical_exit() call that drops the nesting level to zero has a brief window where the pending preemption flag is set and the nesting level is set to zero. This is done purposefully to avoid races where a preemption scheduled by an interrupt could be lost otherwise (see revision 144777). However, this does mean that if an interrupt fires during this window and enters and exits a critical section, it may preempt from the interrupt context. This is generally fine as the interrupt code is careful to arrange critical sections so that they are not exited until it is safe to preempt (e.g. interrupts EOI'd and masked if necessary). However, the SMP rendezvous IPI handler does not quite follow this rule, and in general a rendezvous can never be preempted. Rendezvous handlers are also not permitted to schedule threads to execute, so they will not typically trigger preemptions. SMP rendezvous handlers may use spinlocks (carefully) such as the rm_cleanIPI() handler used in rmlocks, but using a spinlock also enters and exits a critical section. If the interrupted top-half code is in the brief window of critical_exit() where the nesting level is zero but a preemption is pending, then releasing the spinlock can trigger a preemption. Because we know that SMP rendezvous handlers can never schedule a thread, we know that a critical_exit() in an SMP rendezvous handler will only preempt in this edge case. We also know that the top-half thread will happily handle the deferred preemption once the SMP rendezvous has completed, so the preemption will not be lost. This makes it safe to employ a workaround where we use a nested critical section in the SMP rendezvous code itself around rendezvous action routines to prevent any preemptions during an SMP rendezvous. The workaround intentionally avoids checking for a deferred preemption when leaving the critical section on the assumption that if there is a pending preemption it will be handled by the interrupted top-half code. Submitted by: mlaier (variation specific to rm_cleanIPI()) Obtained from: Isilon MFC after: 1 week Modified: head/sys/kern/subr_smp.c Modified: head/sys/kern/subr_smp.c ============================================================================== --- head/sys/kern/subr_smp.c Tue May 24 13:22:40 2011 (r222253) +++ head/sys/kern/subr_smp.c Tue May 24 13:36:41 2011 (r222254) @@ -311,11 +311,15 @@ restart_cpus(cpumask_t map) void smp_rendezvous_action(void) { + struct thread *td; void *local_func_arg; void (*local_setup_func)(void*); void (*local_action_func)(void*); void (*local_teardown_func)(void*); int generation; +#ifdef INVARIANTS + int owepreempt; +#endif /* Ensure we have up-to-date values. */ atomic_add_acq_int(&smp_rv_waiters[0], 1); @@ -330,6 +334,34 @@ smp_rendezvous_action(void) generation = smp_rv_generation; /* + * Use a nested critical section to prevent any preemptions + * from occurring during a rendezvous action routine. + * Specifically, if a rendezvous handler is invoked via an IPI + * and the interrupted thread was in the critical_exit() + * function after setting td_critnest to 0 but before + * performing a deferred preemption, this routine can be + * invoked with td_critnest set to 0 and td_owepreempt true. + * In that case, a critical_exit() during the rendezvous + * action would trigger a preemption which is not permitted in + * a rendezvous action. To fix this, wrap all of the + * rendezvous action handlers in a critical section. We + * cannot use a regular critical section however as having + * critical_exit() preempt from this routine would also be + * problematic (the preemption must not occur before the IPI + * has been acknowleged via an EOI). Instead, we + * intentionally ignore td_owepreempt when leaving the + * critical setion. This should be harmless because we do not + * permit rendezvous action routines to schedule threads, and + * thus td_owepreempt should never transition from 0 to 1 + * during this routine. + */ + td = curthread; + td->td_critnest++; +#ifdef INVARIANTS + owepreempt = td->td_owepreempt; +#endif + + /* * If requested, run a setup function before the main action * function. Ensure all CPUs have completed the setup * function before moving on to the action function. @@ -362,14 +394,18 @@ smp_rendezvous_action(void) */ MPASS(generation == smp_rv_generation); atomic_add_int(&smp_rv_waiters[2], 1); - if (local_teardown_func == smp_no_rendevous_barrier) - return; - while (smp_rv_waiters[2] < smp_rv_ncpus && - generation == smp_rv_generation) - cpu_spinwait(); + if (local_teardown_func != smp_no_rendevous_barrier) { + while (smp_rv_waiters[2] < smp_rv_ncpus && + generation == smp_rv_generation) + cpu_spinwait(); + + if (local_teardown_func != NULL) + local_teardown_func(local_func_arg); + } - if (local_teardown_func != NULL) - local_teardown_func(local_func_arg); + td->td_critnest--; + KASSERT(owepreempt == td->td_owepreempt, + ("rendezvous action changed td_owepreempt")); } void
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201105241336.p4ODagSQ070533>