From owner-freebsd-hackers@freebsd.org  Wed Jan 27 19:06:18 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3AB22A6ED68
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed, 27 Jan 2016 19:06:18 +0000 (UTC)
 (envelope-from wma@semihalf.com)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 1ADCE1112
 for <freebsd-hackers@freebsd.org>; Wed, 27 Jan 2016 19:06:18 +0000 (UTC)
 (envelope-from wma@semihalf.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 17DC9A6ED65; Wed, 27 Jan 2016 19:06:18 +0000 (UTC)
Delivered-To: hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F1AB8A6ED63
 for <hackers@mailman.ysv.freebsd.org>; Wed, 27 Jan 2016 19:06:17 +0000 (UTC)
 (envelope-from wma@semihalf.com)
Received: from mail-ig0-x236.google.com (mail-ig0-x236.google.com
 [IPv6:2607:f8b0:4001:c05::236])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id C53731110
 for <hackers@freebsd.org>; Wed, 27 Jan 2016 19:06:17 +0000 (UTC)
 (envelope-from wma@semihalf.com)
Received: by mail-ig0-x236.google.com with SMTP id mw1so18648940igb.1
 for <hackers@freebsd.org>; Wed, 27 Jan 2016 11:06:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=semihalf-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=mZb7Cg9biXswox7H7/qDgpQh9w6eLXJpd6+BsxavL8U=;
 b=zQjQgneo+qDINVbqBikQCPvD776CLR1LONJyfsHQbeb+qWnO0FOqEPMMADXBGKZer2
 DxrkMzgeXnZ3veCzlmAAmeCBunUsux/VWMt/AdgDJ30NwQuXqxnUWsf4pP4wLZVLOwIq
 WF9TzXakGhCWo1W/K2jxjWQ8AdcsBdAiL4tqYOUJCaus7OuMXNKpIfUPd+y+QZ0LJxvV
 x6WbF0RUwLaFOmZquOVyCIlCJz01HbNDDCM3euxCaV9acdL4SaTo1TmzAwViSbA83wOp
 CnS7m/A11D6g2tZAU9sAKBSH9lLvsh/UFmfQfnzCEYAh5gV+jboDl0Ncp/Ue1QIHAGeT
 66Tg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-type;
 bh=mZb7Cg9biXswox7H7/qDgpQh9w6eLXJpd6+BsxavL8U=;
 b=gnts/Q2uOkC6d0A/1PEUflh3jXivFVz3ThAVkcPSfk4L4Y5Jzgogc3c4wPi4flxLJz
 0TP5vwFTcxWPAK5PlfcEYmH4ocELon0lR1nnKBqv+3fCu9eNefZOrxXZ1jACKi/i2BK4
 aPfroLA4OysmjrCYPEV49R207TuySHph9pmi8AWy5m1yQzlRKCm5gG/wOwDSwfjchmYl
 dzaWDnigZa65rNymxOO6KmKHhB/ltHA+s5DOSQuAaK2mD6uhjOCFfQK7t60Su0O0Ri8m
 AlF1fhGh143pghmUhxN2ohxDYFC4AZKk6OQHTTTZGrgtV1JJgcY46VDxZ9k5PxN+D6TU
 4P8A==
X-Gm-Message-State: AG10YOS51xbWRB75zSPCE1CLbW2SO+/0hg7c3Zmi4jG+FYiBN42ZlecBVJcgAG/PKyWvuuZq4NBU29rKipcd6Q==
X-Received: by 10.50.136.136 with SMTP id qa8mr30316937igb.39.1453921577007;
 Wed, 27 Jan 2016 11:06:17 -0800 (PST)
MIME-Version: 1.0
Received: by 10.107.154.19 with HTTP; Wed, 27 Jan 2016 11:05:57 -0800 (PST)
In-Reply-To: <20160127181426.GA48838@ci0.org>
References: <CANsEV8e2QbW1Y83eC-d3GczWu4Lu91jDK14Xa1FkL=Y2s+RBMQ@mail.gmail.com>
 <2587742.rOiGAYXjN1@ralph.baldwin.cx> <20160127181426.GA48838@ci0.org>
From: Wojciech Macek <wma@semihalf.com>
Date: Wed, 27 Jan 2016 20:05:57 +0100
Message-ID: <CANsEV8djDi9eCtDby-gJt88yLYtdrYMEY5-gjg3axEUEchwybA@mail.gmail.com>
Subject: Re: SCHED_ULE race condition, fix proposal
To: Olivier Houchard <cognet@ci0.org>
Cc: John Baldwin <jhb@freebsd.org>, hackers@freebsd.org,
 freebsd-arm@freebsd.org, arm64-dev <arm64-dev@semihalf.com>
X-Mailman-Approved-At: Wed, 27 Jan 2016 19:31:38 +0000
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jan 2016 19:06:18 -0000

Holly... Good catch :) Indeed, blocked_lock is not a pointer.
I'll check it tommorow at work, but it looks promissing. And the only
reason it didn't cause any data abort is that the mutex structure cointains
a pointer to a string (name) at the offset 0.

Thanks,
Wojtek

2016-01-27 19:14 GMT+01:00 Olivier Houchard <cognet@ci0.org>:

> Hi,
>
> I may be reading the arm64 code wrong, but shouldn't :
>         ldr     x0, =_C_LABEL(blocked_lock)
>                 ldr     x2, [x0]
>
> be just:
>         ldr     x2, =_C_LABEL(blocked_lock)
>
> Regards,
>
> Olivier
>
> On Wed, Jan 27, 2016 at 09:51:12AM -0800, John Baldwin wrote:
> > On Wednesday, January 27, 2016 06:18:16 PM Wojciech Macek wrote:
> > > Hello,
> > >
> > > I've encountered a very nasty race condition during debugging armv8
> HWPMC.
> > > It seems that ULE scheduler can execute the same thread on two
> different
> > > CPUs at the same time...
> > >
> > > Here is the scenario.
> > > The PMC driver must execute some of the code on the CPU0. To ensure
> that, a
> > > process migration is triggered as following:
> > >
> > >
> > > thread_lock(curthread);
> > > sched_bind(curthread, cpu);
> > > thread_unlock(curthread);
> > >
> > > KASSERT(curthread->td_oncpu == cpu,
> > >  ("[pmc,%d] CPU not bound [cpu=%d, curr=%d]", __LINE__,
> > >  cpu, curthread->td_oncpu));
> > >
> > >
> > > That causes the context switch and (finally) execution of
> sched_switch()
> > > function. The code correctly detects migration and calls
> > > sched_switch_migrate. That function is supposed to add current thread
> to
> > > the runqueue of another CPU ("tdn" variable). So it does:
> > >
> > > tdq_lock_pair(tdn, tdq);
> > > tdq_add(tdn, td, flags);
> > > tdq_notify(tdn, td);
> > > TDQ_UNLOCK(tdn);
> > > spinlock_exit();
> > >
> > >
> > > But that sometimes is causing a crash, because the other CPU is
> staring to
> > > process mi_switch as soon as the IPI arrives (via tdq_notify) and the
> > > runqueue lock is released. The problem is, that the thread does not
> contain
> > > valid register set, because its context was not yet stored - that
> happens
> > > later in machine dependent cpu_switch function. In another words, the
> > > sched_switch run on the CPU we want the thread to migrate onto restores
> > > thread context before it was actually stored on another core - that
> causes
> > > setting regs/pc/lt to some junk data and crash.
> > >
> > >
> > > I'd like to discuss a possible solution for this. I think it would be
> > > reasonable to extend cpu_switch to be capable of releasing a lock as
> the
> > > last thing it does after storing everything into the PCB. We could then
> > > remove the "TDQ_UNLOCK(tdn);" from the sched_switch_migrate and be sure
> > > that in the situation of migration nobody is allowed to touch the
> target
> > > runqueue until the migrating process finishes storing its context.
> > >
> > > But first I'd like to discuss some possible alternatives and maybe find
> > > another solution, because any change in this area will impact all
> supported
> > > architectures.
> >
> > This belongs on hackers, not developers@.
> >
> > cpu_switch() already does what you describe though in a slightly
> different
> > way.  The thread_lock() of a thread being switched out is set to
> blocked_lock.
> > cpu_switch() on the new CPU will always spin until cpu_switch updates
> > thread_lock of the old thread to point to the proper runq lock after
> saving
> > its state in the pcb.  arm64 does this here:
> >
> >         /*
> >          * Release the old thread. This doesn't need to be a
> store-release
> >          * as the above dsb instruction will provide release semantics.
> >          */
> >         str     x2, [x0, #TD_LOCK]
> > #if defined(SCHED_ULE) && defined(SMP)
> >         /* Read the value in blocked_lock */
> >         ldr     x0, =_C_LABEL(blocked_lock)
> >         ldr     x2, [x0]
> > 1:
> >         ldar    x3, [x1, #TD_LOCK]
> >         cmp     x3, x2
> >         b.eq    1b
> > #endif
> >
> > Note the thread_lock_block() call just above the block you noted from
> > sched_switch_migrate() to see where td_lock is set to &blocked_lock.
> >
> > If the comment about 'dsb' above is wrong that might explain why you see
> > stale state in the PCB after seeing the new value of td_lock.
> >
> > --
> > John Baldwin
>