From owner-svn-src-stable-8@FreeBSD.ORG Fri Jun 10 19:03:17 2011 Return-Path: Delivered-To: svn-src-stable-8@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E0658106564A; Fri, 10 Jun 2011 19:03:17 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id CF8CE8FC14; Fri, 10 Jun 2011 19:03:17 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.4/8.14.4) with ESMTP id p5AJ3HOL039172; Fri, 10 Jun 2011 19:03:17 GMT (envelope-from jhb@svn.freebsd.org) Received: (from jhb@localhost) by svn.freebsd.org (8.14.4/8.14.4/Submit) id p5AJ3HWt039170; Fri, 10 Jun 2011 19:03:17 GMT (envelope-from jhb@svn.freebsd.org) Message-Id: <201106101903.p5AJ3HWt039170@svn.freebsd.org> From: John Baldwin Date: Fri, 10 Jun 2011 19:03:17 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-8@freebsd.org X-SVN-Group: stable-8 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r222942 - stable/8/sys/kern X-BeenThere: svn-src-stable-8@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for only the 8-stable src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jun 2011 19:03:18 -0000 Author: jhb Date: Fri Jun 10 19:03:17 2011 New Revision: 222942 URL: http://svn.freebsd.org/changeset/base/222942 Log: MFC 222254: Fix an issue with critical sections and SMP rendezvous handlers. Specifically, a critical_exit() call that drops the nesting level to zero has a brief window where the pending preemption flag is set and the nesting level is set to zero. This is done purposefully to avoid races where a preemption scheduled by an interrupt could be lost otherwise (see revision 144777). However, this does mean that if an interrupt fires during this window and enters and exits a critical section, it may preempt from the interrupt context. This is generally fine as the interrupt code is careful to arrange critical sections so that they are not exited until it is safe to preempt (e.g. interrupts EOI'd and masked if necessary). However, the SMP rendezvous IPI handler does not quite follow this rule, and in general a rendezvous can never be preempted. Rendezvous handlers are also not permitted to schedule threads to execute, so they will not typically trigger preemptions. SMP rendezvous handlers may use spinlocks (carefully) such as the rm_cleanIPI() handler used in rmlocks, but using a spinlock also enters and exits a critical section. If the interrupted top-half code is in the brief window of critical_exit() where the nesting level is zero but a preemption is pending, then releasing the spinlock can trigger a preemption. Because we know that SMP rendezvous handlers can never schedule a thread, we know that a critical_exit() in an SMP rendezvous handler will only preempt in this edge case. We also know that the top-half thread will happily handle the deferred preemption once the SMP rendezvous has completed, so the preemption will not be lost. This makes it safe to employ a workaround where we use a nested critical section in the SMP rendezvous code itself around rendezvous action routines to prevent any preemptions during an SMP rendezvous. The workaround intentionally avoids checking for a deferred preemption when leaving the critical section on the assumption that if there is a pending preemption it will be handled by the interrupted top-half code. Modified: stable/8/sys/kern/subr_smp.c Directory Properties: stable/8/sys/ (props changed) stable/8/sys/amd64/include/xen/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) stable/8/sys/contrib/dev/acpica/ (props changed) stable/8/sys/contrib/pf/ (props changed) Modified: stable/8/sys/kern/subr_smp.c ============================================================================== --- stable/8/sys/kern/subr_smp.c Fri Jun 10 18:58:48 2011 (r222941) +++ stable/8/sys/kern/subr_smp.c Fri Jun 10 19:03:17 2011 (r222942) @@ -340,11 +340,15 @@ restart_cpus(cpumask_t map) void smp_rendezvous_action(void) { + struct thread *td; void *local_func_arg; void (*local_setup_func)(void*); void (*local_action_func)(void*); void (*local_teardown_func)(void*); int generation; +#ifdef INVARIANTS + int owepreempt; +#endif /* Ensure we have up-to-date values. */ atomic_add_acq_int(&smp_rv_waiters[0], 1); @@ -359,6 +363,34 @@ smp_rendezvous_action(void) generation = smp_rv_generation; /* + * Use a nested critical section to prevent any preemptions + * from occurring during a rendezvous action routine. + * Specifically, if a rendezvous handler is invoked via an IPI + * and the interrupted thread was in the critical_exit() + * function after setting td_critnest to 0 but before + * performing a deferred preemption, this routine can be + * invoked with td_critnest set to 0 and td_owepreempt true. + * In that case, a critical_exit() during the rendezvous + * action would trigger a preemption which is not permitted in + * a rendezvous action. To fix this, wrap all of the + * rendezvous action handlers in a critical section. We + * cannot use a regular critical section however as having + * critical_exit() preempt from this routine would also be + * problematic (the preemption must not occur before the IPI + * has been acknowleged via an EOI). Instead, we + * intentionally ignore td_owepreempt when leaving the + * critical setion. This should be harmless because we do not + * permit rendezvous action routines to schedule threads, and + * thus td_owepreempt should never transition from 0 to 1 + * during this routine. + */ + td = curthread; + td->td_critnest++; +#ifdef INVARIANTS + owepreempt = td->td_owepreempt; +#endif + + /* * If requested, run a setup function before the main action * function. Ensure all CPUs have completed the setup * function before moving on to the action function. @@ -391,14 +423,18 @@ smp_rendezvous_action(void) */ MPASS(generation == smp_rv_generation); atomic_add_int(&smp_rv_waiters[2], 1); - if (local_teardown_func == smp_no_rendevous_barrier) - return; - while (smp_rv_waiters[2] < smp_rv_ncpus && - generation == smp_rv_generation) - cpu_spinwait(); + if (local_teardown_func != smp_no_rendevous_barrier) { + while (smp_rv_waiters[2] < smp_rv_ncpus && + generation == smp_rv_generation) + cpu_spinwait(); + + if (local_teardown_func != NULL) + local_teardown_func(local_func_arg); + } - if (local_teardown_func != NULL) - local_teardown_func(local_func_arg); + td->td_critnest--; + KASSERT(owepreempt == td->td_owepreempt, + ("rendezvous action changed td_owepreempt")); } void