From owner-freebsd-hackers@freebsd.org  Wed Jan 27 18:14:32 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 08B50A70A51
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed, 27 Jan 2016 18:14:32 +0000 (UTC) (envelope-from cognet@ci0.org)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id EA9301589
 for <freebsd-hackers@freebsd.org>; Wed, 27 Jan 2016 18:14:31 +0000 (UTC)
 (envelope-from cognet@ci0.org)
Received: by mailman.ysv.freebsd.org (Postfix)
 id E7B1CA70A50; Wed, 27 Jan 2016 18:14:31 +0000 (UTC)
Delivered-To: hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id CD537A70A4F;
 Wed, 27 Jan 2016 18:14:31 +0000 (UTC) (envelope-from cognet@ci0.org)
Received: from kanar.ci0.org (kanar.ci0.org [IPv6:2001:bc8:35e6::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 81A261587;
 Wed, 27 Jan 2016 18:14:31 +0000 (UTC) (envelope-from cognet@ci0.org)
Received: from kanar.ci0.org (pluxor@localhost [127.0.0.1])
 by kanar.ci0.org (8.14.9/8.14.8) with ESMTP id u0RIEQS2048903;
 Wed, 27 Jan 2016 19:14:26 +0100 (CET) (envelope-from cognet@ci0.org)
Received: (from doginou@localhost)
 by kanar.ci0.org (8.14.9/8.14.8/Submit) id u0RIEQmb048902;
 Wed, 27 Jan 2016 19:14:26 +0100 (CET) (envelope-from cognet@ci0.org)
X-Authentication-Warning: kanar.ci0.org: doginou set sender to cognet@ci0.org
 using -f
Date: Wed, 27 Jan 2016 19:14:26 +0100
From: Olivier Houchard <cognet@ci0.org>
To: John Baldwin <jhb@freebsd.org>
Cc: Wojciech Macek <wma@semihalf.com>, hackers@freebsd.org,
 freebsd-arm@freebsd.org, arm64-dev <arm64-dev@semihalf.com>
Subject: Re: SCHED_ULE race condition, fix proposal
Message-ID: <20160127181426.GA48838@ci0.org>
References: <CANsEV8e2QbW1Y83eC-d3GczWu4Lu91jDK14Xa1FkL=Y2s+RBMQ@mail.gmail.com>
 <2587742.rOiGAYXjN1@ralph.baldwin.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2587742.rOiGAYXjN1@ralph.baldwin.cx>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Mailman-Approved-At: Wed, 27 Jan 2016 18:25:52 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jan 2016 18:14:32 -0000

Hi,

I may be reading the arm64 code wrong, but shouldn't :
        ldr     x0, =_C_LABEL(blocked_lock)
	        ldr     x2, [x0]

be just:
	ldr     x2, =_C_LABEL(blocked_lock)

Regards,

Olivier

On Wed, Jan 27, 2016 at 09:51:12AM -0800, John Baldwin wrote:
> On Wednesday, January 27, 2016 06:18:16 PM Wojciech Macek wrote:
> > Hello,
> > 
> > I've encountered a very nasty race condition during debugging armv8 HWPMC.
> > It seems that ULE scheduler can execute the same thread on two different
> > CPUs at the same time...
> > 
> > Here is the scenario.
> > The PMC driver must execute some of the code on the CPU0. To ensure that, a
> > process migration is triggered as following:
> > 
> > 
> > thread_lock(curthread);
> > sched_bind(curthread, cpu);
> > thread_unlock(curthread);
> > 
> > KASSERT(curthread->td_oncpu == cpu,
> >  ("[pmc,%d] CPU not bound [cpu=%d, curr=%d]", __LINE__,
> >  cpu, curthread->td_oncpu));
> > 
> > 
> > That causes the context switch and (finally) execution of sched_switch()
> > function. The code correctly detects migration and calls
> > sched_switch_migrate. That function is supposed to add current thread to
> > the runqueue of another CPU ("tdn" variable). So it does:
> > 
> > tdq_lock_pair(tdn, tdq);
> > tdq_add(tdn, td, flags);
> > tdq_notify(tdn, td);
> > TDQ_UNLOCK(tdn);
> > spinlock_exit();
> > 
> > 
> > But that sometimes is causing a crash, because the other CPU is staring to
> > process mi_switch as soon as the IPI arrives (via tdq_notify) and the
> > runqueue lock is released. The problem is, that the thread does not contain
> > valid register set, because its context was not yet stored - that happens
> > later in machine dependent cpu_switch function. In another words, the
> > sched_switch run on the CPU we want the thread to migrate onto restores
> > thread context before it was actually stored on another core - that causes
> > setting regs/pc/lt to some junk data and crash.
> > 
> > 
> > I'd like to discuss a possible solution for this. I think it would be
> > reasonable to extend cpu_switch to be capable of releasing a lock as the
> > last thing it does after storing everything into the PCB. We could then
> > remove the "TDQ_UNLOCK(tdn);" from the sched_switch_migrate and be sure
> > that in the situation of migration nobody is allowed to touch the target
> > runqueue until the migrating process finishes storing its context.
> > 
> > But first I'd like to discuss some possible alternatives and maybe find
> > another solution, because any change in this area will impact all supported
> > architectures.
> 
> This belongs on hackers, not developers@.
> 
> cpu_switch() already does what you describe though in a slightly different
> way.  The thread_lock() of a thread being switched out is set to blocked_lock.
> cpu_switch() on the new CPU will always spin until cpu_switch updates
> thread_lock of the old thread to point to the proper runq lock after saving
> its state in the pcb.  arm64 does this here:
> 
>         /*
>          * Release the old thread. This doesn't need to be a store-release
>          * as the above dsb instruction will provide release semantics.
>          */
>         str     x2, [x0, #TD_LOCK]
> #if defined(SCHED_ULE) && defined(SMP)
>         /* Read the value in blocked_lock */
>         ldr     x0, =_C_LABEL(blocked_lock)
>         ldr     x2, [x0]
> 1:
>         ldar    x3, [x1, #TD_LOCK]
>         cmp     x3, x2
>         b.eq    1b
> #endif
> 
> Note the thread_lock_block() call just above the block you noted from
> sched_switch_migrate() to see where td_lock is set to &blocked_lock.
> 
> If the comment about 'dsb' above is wrong that might explain why you see
> stale state in the PCB after seeing the new value of td_lock.
> 
> -- 
> John Baldwin