From owner-freebsd-current@FreeBSD.ORG  Mon Sep 13 18:08:58 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id D124116A4CF; Mon, 13 Sep 2004 18:08:58 +0000 (GMT)
Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 907DD43D1D; Mon, 13 Sep 2004 18:08:58 +0000 (GMT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (julian.vicor-nb.com [208.206.78.97])
	by mail.vicor-nb.com (Postfix) with ESMTP
	id 377877A3D2; Mon, 13 Sep 2004 11:08:58 -0700 (PDT)
Message-ID: <4145E23A.9040108@elischer.org>
Date: Mon, 13 Sep 2004 11:08:58 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516
X-Accept-Language: en, hu
MIME-Version: 1.0
To: John Baldwin <jhb@FreeBSD.org>
References: <4143EF29.2080404@elischer.org>
	<200409131331.03881.jhb@FreeBSD.org>
In-Reply-To: <200409131331.03881.jhb@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
cc: current@FreeBSD.org
cc: Peter Wemm <peter@FreeBSD.org>
Subject: Re: [Patch] panics/hangs with preemption and threads.
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Sep 2004 18:08:59 -0000


John Baldwin wrote:

>On Sunday 12 September 2004 02:39 am, Julian Elischer wrote:
>  
>
>>Guys I think I found a (the?) major cause for the corruptions of the
>>ksegrp/thread runqueue for threaded processes when Premption is turned on..
>>
>>When a thread is scheduled in setrunqueue() the firt thing that is done
>>is that it is put in the correct place in the ksegrp's run queue,.
>>then if it is in the top N spots (where N is the defined concurrency
>>and is usually <= NCPU) it is passed down to the system scheduler
>>using sched_add().
>>Sched_add can call maybe_preempt() which can decide to switch out the
>>current thread and switch to the new one immediatly.
>>The trouble with that is that we have already put the new one on the
>>ksegrp's run queue! When that thread is next put on the run queue using
>>setrunqueue() it is already there, and we end up with an infinitly looping
>>run queue. Any code that follows that list will never end. and the system
>>will freeze.
>>
>>Here is a patch that solves it but I'm not happy about it..
>>John, you wrote the preemption code..
>>do you have any ideas about how to do this cleaner?
>>
>>One possibility is to make sched_add return a value that indicates if the
>>thread was handled immediatly.  that would allow setrunqueue to only set it
>>into the ksegrp's run queue if it was not already handled.
>>
>>Other suggestions welcome.
>>    
>>
>
>I think it's probably a good idea to do the preemption check before putting 
>the thread on the kse group.  However, that might break ULE and some things 
>it does (ULE pins interrupt threads but does it in sched_add, perhaps that is 
>a hack and the pinning should be done in ithread_schedule instead).  Changing 
>sched_add() to return a boolean similar to maybe_preempt() is probably ok as 
>an alternative then.  Also, there's really no need for an additional 
>SRQ_NOPREEMPT flag, that just duplicates critical_enter()/critical_exit().  
>The same is probably true of SRQ_YIELDING and SRQ_MYSELF (preemption already 
>doesn't preempt to curthread since the priorities are equal).  The place that 
>uses SRQ_YIELDING can just add a critical section around the call to 
>setrunqueue().  Note that when a preemption is deferred due to a nested 
>critical section, the preemption doesn't actually occur until the outermost 
>critical section is exited, so if you do this:
>
>	mtx_lock_spin(&sched_lock);
>	blah blah;
>	if (foo) {
>		critical_enter();
>		setrunqueue(td2);
>		critical_exit();
>		mi_switch(NULL, SW_VOL);
>	}
>	mtx_unlock_spin(&sched_lock);
>
>That won't actually preempt.
>

The flags were not only for this problem but also for  another scheduer 
I was playign with and
I thought they might be useful in trying to find/fix this problem.
The critical nest thing is ok, but I had a lot of debug code in at one 
stage and I wanted to know
more about where I had come from. so the flags I had already gave me that..
I don't see the harm in having more information but I realised 
afterwards that there was already some of this info available..

for MYSELF, curthead == newtd and for  INTR teh process/thread is marked 
as an interrupt thread, which leaves only "Yielding"
which I do think is useful info to know. but the critnest solves teh 
same problem in a more specific manner.


>
>  
>