From owner-freebsd-mips@FreeBSD.ORG  Thu Sep 29 15:03:04 2011
Return-Path: <owner-freebsd-mips@FreeBSD.ORG>
Delivered-To: freebsd-mips@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CB407106566B
	for <freebsd-mips@freebsd.org>; Thu, 29 Sep 2011 15:03:04 +0000 (UTC)
	(envelope-from adrian.chadd@gmail.com)
Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com
	[209.85.218.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 8467B8FC08
	for <freebsd-mips@freebsd.org>; Thu, 29 Sep 2011 15:03:04 +0000 (UTC)
Received: by yia13 with SMTP id 13so788411yia.13
	for <freebsd-mips@freebsd.org>; Thu, 29 Sep 2011 08:03:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=xIcYl5JAdkIBPBqgF3mFc+TGiqIjikAN+NBo0bCyVzk=;
	b=ggWsVE09W3OWr2+9duitIogEaE+9BJg1EqFbrT/78SXQyXSG+qD0DSQToNGVvh9nX9
	J51DQDNx5A/FE/tUaqVlosFuC/0VSux5Jn5/ONgvUGIPuVouMKcFxTDbL97cw/HtPPW2
	l/2mdvVNEJu1paJHlZQmKqLIgyy+8TY3+qmOs=
MIME-Version: 1.0
Received: by 10.236.79.72 with SMTP id h48mr66740150yhe.4.1317308583281; Thu,
	29 Sep 2011 08:03:03 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.236.111.42 with HTTP; Thu, 29 Sep 2011 08:03:03 -0700 (PDT)
In-Reply-To: <AC6674AB7BC78549BB231821ABF7A9AEB8086A4A84@EMBX01-WF.jnpr.net>
References: <CAJ-Vmo=qONOffCTgusWtbwuo43zKYyXDqqu5YEaL-MDQSbt-mQ@mail.gmail.com>
	<CAJ-Vmo=i6-3PNTPbP5xCftNU0w1OmMhZSysgaSRzDqgwLU6prQ@mail.gmail.com>
	<CA+7sy7DpEEhZ7WGoT-p9FCgvGBAeBHnyGVXmcUtHs+Tt6tsTng@mail.gmail.com>
	<AC6674AB7BC78549BB231821ABF7A9AEB8086A4A84@EMBX01-WF.jnpr.net>
Date: Thu, 29 Sep 2011 23:03:03 +0800
X-Google-Sender-Auth: hYc5BhL4xpP6dTT5WcP1CZsNDGw
Message-ID: <CAJ-Vmok9X2XFbq=ZOee+Xzuf6-=RYze+roM19HSQZYabm95nEw@mail.gmail.com>
From: Adrian Chadd <adrian@freebsd.org>
To: Andrew Duane <aduane@juniper.net>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: "freebsd-mips@freebsd.org" <freebsd-mips@freebsd.org>
Subject: Re: eventtimer issue on mips: temporary workaround
X-BeenThere: freebsd-mips@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Porting FreeBSD to MIPS <freebsd-mips.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-mips>,
	<mailto:freebsd-mips-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-mips>
List-Post: <mailto:freebsd-mips@freebsd.org>
List-Help: <mailto:freebsd-mips-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-mips>,
	<mailto:freebsd-mips-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Sep 2011 15:03:04 -0000

On 29 September 2011 22:12, Andrew Duane <aduane@juniper.net> wrote:
> I've been running JC's patch on my Octeon blade for a while, and it does =
fix some serious problems. It's been very stable.
>
> As to why "wait" wouldn't just return if an interrupt is asserted, I don'=
t know. The MIPS manual is (deliberately?) vague on these details, but any =
software engineer can explain why that's needed to write correct software. =
Maybe the silicon engineers didn't listen?

That's not the problem. I'm sorry if this isn't news to you - I've had
to explain what's going on to a few people, so I figure I may as well
just brain dump it one last time. :)

The problem is this:

* in the past, cpu_idle() would just call wait;. There'd be no
interrupts asserted and no critical section.
* any interrupt - clock (say, 1000hz) or other device interrupt would
interrupt that.
* if an interrupt happened -just- before wait, it'd force that
particular interrupt handler to be run. That may add something to the
run queue, so you don't want to have an interrupt schedule a task
(netisr, or a fast interrupt handler schedule a taskqueue, etc, etc)
* .. then you'd still have wait called. There's now a task on the
runqueue, but since you're now in wait; you won't break out until the
next interrupt occurs.

So the i386/amd64 code does this:

intr_disable();
if (sched_runnable()) {
    intr_enable();
} else {
    [atomic enable interrupts and wait]
}

So this way:

* if something schedules an interrupt _just before_ the wait call in
the idle thread, it'd be deferred due to intr_disable, then the atomic
sti;hlt would reenable interrupts and enter halt. The pending
interrupt would then immediately break that halt.
* If something didn't schedule an interrupt _just before_ the wait
call, then sti;hlt would enter wait, and the next interrupt would
trigger it.

The question here is what happens if there's no "atomic enable
interrupts and wait". An interrupt would come in between the "enable
interrupt" and "wait" call. That flips over to the interrupt handler
(netisr, fast, etc.) That may schedule a task on the run queue. Then
it flips back to the idle task and the idle task runs halt/wait/etc.
That scheduled task doesn't get run until the next interrupt breaks
the wait instruction.

If there's just a halt/wait call, with none of the above interrupt
disable/enable, then you still have the race - some interrupt may come
in between the last point where the scheduler does its thing (ie,
schedules something other than the idle task) and the wait/halt
instruction.

So with preemption, my understanding is the idle task can be
preempted. So as long as the interrupt occurs before the wait/halt
instruction, the scheduler could preempt the low priority idle task
with the higher priority "any other" task, and (i -think-) the race
condition becomes less of a problem.

Now this is made worse by the eventtimer changes. Because now, there's
a critical section being maintained across the wait call. So for
i386/amd64:

* interrupt occurs first, changes the runqueue, and if you don't have
preemption enabled it will just push something onto the runqueue.
* then in cpu_idle(), it calls critical_enter()
* .. then it does the eventtimer stuff
* then it does the above intr disable / sched_running dance - finds
something on the runqueue, and skips running wait
* .. then it eventually calls critical_exit(), which can be a point
where preemption occurs.
* .. and if no preemption, it exits cpu_idle() and runs the next task

All nice and fine. But then, this can happen:

* in cpu_idle(), it calls critical_enter()
* .. then you get an interrupt, which pushes something onto the runqueue
* then it runs the eventtimer stuff
* then it does the intr disable / sched running dance - finds
something on the runqueue, skips running wait.
* .. calls critical_exit(), where preemption occurs.
* .. and if no preemption, it exits cpu_idle() and runs the next task

Again, all nice and fine. but then with mips, this happens:

* in cpu_idle(), it calls critical_enter()
* .. interrupt - runqueue changes
* .. then it calls wait - which means that your runqueue won't be
handled until the next interrupt comes along, breaking the wait call

So if you add a non-atomic "enable interrupts and wait":

* cpu_idle() - calls critical_enter()
* disable interrupts for schedule check
* interrupt occurs - but they're disabled, so nothing happens yet
* nothing on runqueue, so it goes to call "enable interrupts, wait"
* interrupt happens - pushes something onto the runqueue
* then wait call happens - and won't wake up until the next interrupt.


Does this all make sense?


Adrian