From owner-freebsd-current@FreeBSD.ORG Wed Sep 1 13:50:02 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A7BD610656A7; Wed, 1 Sep 2010 13:50:02 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 778528FC20; Wed, 1 Sep 2010 13:50:02 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 1682146B09; Wed, 1 Sep 2010 09:50:02 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 08B3A8A04E; Wed, 1 Sep 2010 09:50:01 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Date: Wed, 1 Sep 2010 09:49:59 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009010950.00422.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 01 Sep 2010 09:50:01 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: mdf@freebsd.org, Jeff Roberson Subject: Re: sched_pin() bug in SCHED_ULE X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Sep 2010 13:50:02 -0000 On Tuesday, August 31, 2010 2:53:12 pm mdf@freebsd.org wrote: > On Tue, Aug 31, 2010 at 10:16 AM, wrote: > > I recorded the stack any time ts->ts_cpu was set and when a thread was > > migrated by sched_switch() I printed out the recorded info. Here's > > what I found: > > > > > > XXX bug 67957: moving 0xffffff003ff9b800 from 3 to 1 > > [1]: pin 0 state 4 move 3 -> 1 done by 0xffffff000cc44000: > > #0 0xffffffff802b36b4 at bug67957+0x84 > > #1 0xffffffff802b5dd4 at sched_affinity+0xd4 > > #2 0xffffffff8024a707 at cpuset_setthread+0x137 > > #3 0xffffffff8024aeae at cpuset_setaffinity+0x21e > > #4 0xffffffff804a82df at freebsd32_cpuset_setaffinity+0x4f > > #5 0xffffffff80295f49 at isi_syscall+0x99 > > #6 0xffffffff804a630e at ia32_syscall+0x1ce > > #7 0xffffffff8046dc60 at Xint0x80_syscall+0x60 > > [0]: pin 0 state 2 move 0 -> 3 done by 0xffffff000cc44000: > > #0 0xffffffff802b36b4 at bug67957+0x84 > > #1 0xffffffff802b4ad8 at sched_add+0xe8 > > #2 0xffffffff8029b96a at create_thread+0x34a > > #3 0xffffffff8029badc at kern_thr_new+0x8c > > #4 0xffffffff804a8912 at freebsd32_thr_new+0x122 > > #5 0xffffffff80295f49 at isi_syscall+0x99 > > #6 0xffffffff804a630e at ia32_syscall+0x1ce > > #7 0xffffffff8046dc60 at Xint0x80_syscall+0x60 > > > > So one thread in the process called cpuset_setaffinity(2), and another > > thread in the process was forcibly migrated by the IPI while returning > > from a syscall, while it had td_pinned set. > > > > Given this path, it seems reasonable to me to skip the migrate if we > > notice THREAD_CAN_MIGRATE is false. > > > > Opinions? My debug code is below. I'll try to write a short testcase > > that exhibits this bug. > > Just a few more thoughts on this. The check in sched_affinity for > THREAD_CAN_MIGRATE is racy. Since witness uses sched_pin, it's not > simple to take the THREAD lock around an increment of td_pinned. So > I'm looking for suggestions on the best way to fix this issue. My > thoughts: > > 1) add a check in sched_switch() for THREAD_CAN_MIGRATE > 2) have WITNESS not use sched_pin, and take the THREAD lock when > modifying td_pinned > 3) have the IPI_PREEMPT handler notice the thread is pinned (and not > trying to bind) and postpone the mi_switch, just like it postpones > when a thread is in a critical section. > > Except for the potential complexity of implementation, I think (2) is > the best solution. I don't like 2) only because you'd really need to fix all the other places that use sched_pin() to grab the thread lock. That would be a bit expensive. I think the problem is code that checks THREAD_CAN_MIGRATE() for running threads that aren't curthread. The sched_affinity() bug is probably my fault FWIW. I think something along the lines of 1) is the best approach. Maybe something like the patch below. It moves the CPU assignment out of sched_affinity() and moves it into sched_switch() instead so that it takes effect on the next context switch for this thread. That is the point at which the CPU binding actually takes effect anyway since sched_affinity() doesn't force an immediate context switch. This patch is relative to 7. The patch for 9 is nearly identical (just a fixup for ipi_cpu() vs ipi_selected()). Index: sched_ule.c =================================================================== --- sched_ule.c (revision 212065) +++ sched_ule.c (working copy) @@ -1885,6 +1885,8 @@ srqflag = (flags & SW_PREEMPT) ? SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : SRQ_OURSELF|SRQ_YIELDING; + if (THREAD_CAN_MIGRATE(td) && !THREAD_CAN_SCHED(td, ts->tscpu)) + ts->ts_cpu = sched_pickcpu(td, 0); if (ts->ts_cpu == cpuid) tdq_add(tdq, td, srqflag); else @@ -2536,7 +2538,6 @@ { #ifdef SMP struct td_sched *ts; - int cpu; THREAD_LOCK_ASSERT(td, MA_OWNED); ts = td->td_sched; @@ -2550,17 +2551,13 @@ if (!TD_IS_RUNNING(td)) return; td->td_flags |= TDF_NEEDRESCHED; - if (!THREAD_CAN_MIGRATE(td)) - return; /* - * Assign the new cpu and force a switch before returning to - * userspace. If the target thread is not running locally send - * an ipi to force the issue. + * Force a switch before returning to userspace. If the + * target thread is not running locally send an ipi to force + * the issue. */ - cpu = ts->ts_cpu; - ts->ts_cpu = sched_pickcpu(td, 0); - if (cpu != PCPU_GET(cpuid)) - ipi_selected(1 << cpu, IPI_PREEMPT); + if (td != curthread) + ipi_selected(1 << ts->ts_cpu, IPI_PREEMPT); #endif } -- John Baldwin