From owner-freebsd-hackers@freebsd.org Wed Jan 27 18:14:32 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 08B50A70A51 for ; Wed, 27 Jan 2016 18:14:32 +0000 (UTC) (envelope-from cognet@ci0.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id EA9301589 for ; Wed, 27 Jan 2016 18:14:31 +0000 (UTC) (envelope-from cognet@ci0.org) Received: by mailman.ysv.freebsd.org (Postfix) id E7B1CA70A50; Wed, 27 Jan 2016 18:14:31 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CD537A70A4F; Wed, 27 Jan 2016 18:14:31 +0000 (UTC) (envelope-from cognet@ci0.org) Received: from kanar.ci0.org (kanar.ci0.org [IPv6:2001:bc8:35e6::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 81A261587; Wed, 27 Jan 2016 18:14:31 +0000 (UTC) (envelope-from cognet@ci0.org) Received: from kanar.ci0.org (pluxor@localhost [127.0.0.1]) by kanar.ci0.org (8.14.9/8.14.8) with ESMTP id u0RIEQS2048903; Wed, 27 Jan 2016 19:14:26 +0100 (CET) (envelope-from cognet@ci0.org) Received: (from doginou@localhost) by kanar.ci0.org (8.14.9/8.14.8/Submit) id u0RIEQmb048902; Wed, 27 Jan 2016 19:14:26 +0100 (CET) (envelope-from cognet@ci0.org) X-Authentication-Warning: kanar.ci0.org: doginou set sender to cognet@ci0.org using -f Date: Wed, 27 Jan 2016 19:14:26 +0100 From: Olivier Houchard To: John Baldwin Cc: Wojciech Macek , hackers@freebsd.org, freebsd-arm@freebsd.org, arm64-dev Subject: Re: SCHED_ULE race condition, fix proposal Message-ID: <20160127181426.GA48838@ci0.org> References: <2587742.rOiGAYXjN1@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2587742.rOiGAYXjN1@ralph.baldwin.cx> User-Agent: Mutt/1.5.23 (2014-03-12) X-Mailman-Approved-At: Wed, 27 Jan 2016 18:25:52 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jan 2016 18:14:32 -0000 Hi, I may be reading the arm64 code wrong, but shouldn't : ldr x0, =_C_LABEL(blocked_lock) ldr x2, [x0] be just: ldr x2, =_C_LABEL(blocked_lock) Regards, Olivier On Wed, Jan 27, 2016 at 09:51:12AM -0800, John Baldwin wrote: > On Wednesday, January 27, 2016 06:18:16 PM Wojciech Macek wrote: > > Hello, > > > > I've encountered a very nasty race condition during debugging armv8 HWPMC. > > It seems that ULE scheduler can execute the same thread on two different > > CPUs at the same time... > > > > Here is the scenario. > > The PMC driver must execute some of the code on the CPU0. To ensure that, a > > process migration is triggered as following: > > > > > > thread_lock(curthread); > > sched_bind(curthread, cpu); > > thread_unlock(curthread); > > > > KASSERT(curthread->td_oncpu == cpu, > > ("[pmc,%d] CPU not bound [cpu=%d, curr=%d]", __LINE__, > > cpu, curthread->td_oncpu)); > > > > > > That causes the context switch and (finally) execution of sched_switch() > > function. The code correctly detects migration and calls > > sched_switch_migrate. That function is supposed to add current thread to > > the runqueue of another CPU ("tdn" variable). So it does: > > > > tdq_lock_pair(tdn, tdq); > > tdq_add(tdn, td, flags); > > tdq_notify(tdn, td); > > TDQ_UNLOCK(tdn); > > spinlock_exit(); > > > > > > But that sometimes is causing a crash, because the other CPU is staring to > > process mi_switch as soon as the IPI arrives (via tdq_notify) and the > > runqueue lock is released. The problem is, that the thread does not contain > > valid register set, because its context was not yet stored - that happens > > later in machine dependent cpu_switch function. In another words, the > > sched_switch run on the CPU we want the thread to migrate onto restores > > thread context before it was actually stored on another core - that causes > > setting regs/pc/lt to some junk data and crash. > > > > > > I'd like to discuss a possible solution for this. I think it would be > > reasonable to extend cpu_switch to be capable of releasing a lock as the > > last thing it does after storing everything into the PCB. We could then > > remove the "TDQ_UNLOCK(tdn);" from the sched_switch_migrate and be sure > > that in the situation of migration nobody is allowed to touch the target > > runqueue until the migrating process finishes storing its context. > > > > But first I'd like to discuss some possible alternatives and maybe find > > another solution, because any change in this area will impact all supported > > architectures. > > This belongs on hackers, not developers@. > > cpu_switch() already does what you describe though in a slightly different > way. The thread_lock() of a thread being switched out is set to blocked_lock. > cpu_switch() on the new CPU will always spin until cpu_switch updates > thread_lock of the old thread to point to the proper runq lock after saving > its state in the pcb. arm64 does this here: > > /* > * Release the old thread. This doesn't need to be a store-release > * as the above dsb instruction will provide release semantics. > */ > str x2, [x0, #TD_LOCK] > #if defined(SCHED_ULE) && defined(SMP) > /* Read the value in blocked_lock */ > ldr x0, =_C_LABEL(blocked_lock) > ldr x2, [x0] > 1: > ldar x3, [x1, #TD_LOCK] > cmp x3, x2 > b.eq 1b > #endif > > Note the thread_lock_block() call just above the block you noted from > sched_switch_migrate() to see where td_lock is set to &blocked_lock. > > If the comment about 'dsb' above is wrong that might explain why you see > stale state in the PCB after seeing the new value of td_lock. > > -- > John Baldwin