From owner-freebsd-arm@freebsd.org Thu Jan 28 02:08:27 2016 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7C29CA6F746 for ; Thu, 28 Jan 2016 02:08:27 +0000 (UTC) (envelope-from koobs.freebsd@gmail.com) Received: from mail-pf0-x229.google.com (mail-pf0-x229.google.com [IPv6:2607:f8b0:400e:c00::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CCDA1508; Thu, 28 Jan 2016 02:08:27 +0000 (UTC) (envelope-from koobs.freebsd@gmail.com) Received: by mail-pf0-x229.google.com with SMTP id o185so9993886pfb.1; Wed, 27 Jan 2016 18:08:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:reply-to:subject:references:to:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=Wzk66Anq+O4nGwC2P8Vw+Y4OzHKaIYoF2ev7i/AqbzQ=; b=hhlaYX0XiguStMwwg011QJmIT8EfsEVntZcimExzN9QcRo3XQ50/hYK10QIL8O20Yh K0/55W4m11Y+FLaB3dvWmUMRVc23VNKu5EZeyAkYHUpuhNoMFwkGO3gKjXmjI0FfelbJ /gkcQIgcUcqai9KS/X+M07fpCHqfDbJa/rNzJerlFju0Q45iaMkwtWCfGF9p1CRBXmJq uzLQf8ylLJkVRWyieWyLRyon6ssSWY2HQOAFiEmPu+P1MsLa6m/0iWgPiepy9eE6p8UI U+ex/FVESH64cpnAPS9GyvHNf70kBAfOPlIo3Aw3S9aPJIQv8LT3zMd1Ce/Xs91Kpsby XcFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:reply-to:subject:references:to:cc:from :message-id:date:user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=Wzk66Anq+O4nGwC2P8Vw+Y4OzHKaIYoF2ev7i/AqbzQ=; b=YVoB5aNVbUfDM9MWk+GljMCUEuQhgrjaKwJk4acvTODg9Z5oT/c8DPQC4bkuGZHOtT cmrR05lEEju5lZ9W0/y1ZkGmKU+2hut0oFlh99dpq5GKb1TCA2C4Q/NdLI4r74+ZAIRv DJY68D4lM2EFKE80HeZBiuifUSA8mrrmRVDjaVFxqmSsefgcEOWd0Em5NZr5fBoF6llc seAQ0H8dEvgE+hy9TuUcbiw3HksOypef9ZnuGVPQlVs5k+58GQbA2mkhoA2nhT1B7BX+ Xy+eMBdOmXApyr1U2BlVWuSt9DiY0+iXKafUK2Be8XkqdOX6S/pkrBvYaOebvoikA12o hM0A== X-Gm-Message-State: AG10YOTP96HWIYdDfMjDRIyxahAmFsg0VCcD2n4JE4ALjMRvJL6BYXpGYz3bjwHPG0+Tjg== X-Received: by 10.98.65.203 with SMTP id g72mr823789pfd.44.1453946906905; Wed, 27 Jan 2016 18:08:26 -0800 (PST) Received: from ?IPv6:2001:44b8:31ae:7b01:6821:be33:19f8:f73a? (2001-44b8-31ae-7b01-6821-be33-19f8-f73a.static.ipv6.internode.on.net. [2001:44b8:31ae:7b01:6821:be33:19f8:f73a]) by smtp.gmail.com with ESMTPSA id m87sm11842808pfi.47.2016.01.27.18.08.24 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 27 Jan 2016 18:08:26 -0800 (PST) Sender: Kubilay Kocak Reply-To: koobs@FreeBSD.org Subject: Re: SCHED_ULE race condition, fix proposal References: To: Wojciech Macek , developers@freebsd.org, freebsd-arm@freebsd.org Cc: Olivier Houchard , arm64-dev From: Kubilay Kocak Message-ID: <56A97810.8090303@FreeBSD.org> Date: Thu, 28 Jan 2016 13:08:16 +1100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Thunderbird/44.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jan 2016 02:08:27 -0000 On 28/01/2016 4:18 AM, Wojciech Macek wrote: > Hello, > > I've encountered a very nasty race condition during debugging armv8 > HWPMC. It seems that ULE scheduler can execute the same thread on two > different CPUs at the same time... > > Here is the scenario. > The PMC driver must execute some of the code on the CPU0. To ensure > that, a process migration is triggered as following: > > > thread_lock(curthread); > sched_bind(curthread, cpu); > thread_unlock(curthread); > > KASSERT(curthread->td_oncpu == cpu, > ("[pmc,%d] CPU not bound [cpu=%d, curr=%d]", __LINE__, > cpu, curthread->td_oncpu)); > > > That causes the context switch and (finally) execution of sched_switch() > function. The code correctly detects migration and calls > sched_switch_migrate. That function is supposed to add current thread to > the runqueue of another CPU ("tdn" variable). So it does: > > tdq_lock_pair(tdn, tdq); > tdq_add(tdn, td, flags); > tdq_notify(tdn, td); > TDQ_UNLOCK(tdn); > spinlock_exit(); > > > But that sometimes is causing a crash, because the other CPU is staring > to process mi_switch as soon as the IPI arrives (via tdq_notify) and the > runqueue lock is released. The problem is, that the thread does not > contain valid register set, because its context was not yet stored - > that happens later in machine dependent cpu_switch function. In another > words, the sched_switch run on the CPU we want the thread to migrate > onto restores thread context before it was actually stored on another > core - that causes setting regs/pc/lt to some junk data and crash. > > > I'd like to discuss a possible solution for this. I think it would be > reasonable to extend cpu_switch to be capable of releasing a lock as the > last thing it does after storing everything into the PCB. We could then > remove the "TDQ_UNLOCK(tdn);" from the sched_switch_migrate and be sure > that in the situation of migration nobody is allowed to touch the target > runqueue until the migrating process finishes storing its context. > > But first I'd like to discuss some possible alternatives and maybe find > another solution, because any change in this area will impact all > supported architectures. > > > Regards, > Wojtek > wma@freebsd.org Can you create an issue in bugzilla to track this?