From owner-freebsd-current@FreeBSD.ORG Mon Sep 13 17:34:08 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 801E116A4E5 for ; Mon, 13 Sep 2004 17:34:08 +0000 (GMT) Received: from mail3.speakeasy.net (mail3.speakeasy.net [216.254.0.203]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3BA2343D5A for ; Mon, 13 Sep 2004 17:34:08 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 13378 invoked from network); 13 Sep 2004 17:34:08 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 13 Sep 2004 17:34:07 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8DHXS6t041610; Mon, 13 Sep 2004 13:34:03 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Julian Elischer Date: Mon, 13 Sep 2004 13:31:03 -0400 User-Agent: KMail/1.6.2 References: <4143EF29.2080404@elischer.org> In-Reply-To: <4143EF29.2080404@elischer.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200409131331.03881.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: current@FreeBSD.org cc: Peter Wemm Subject: Re: [Patch] panics/hangs with preemption and threads. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Sep 2004 17:34:08 -0000 On Sunday 12 September 2004 02:39 am, Julian Elischer wrote: > Guys I think I found a (the?) major cause for the corruptions of the > ksegrp/thread runqueue for threaded processes when Premption is turned on.. > > When a thread is scheduled in setrunqueue() the firt thing that is done > is that it is put in the correct place in the ksegrp's run queue,. > then if it is in the top N spots (where N is the defined concurrency > and is usually <= NCPU) it is passed down to the system scheduler > using sched_add(). > Sched_add can call maybe_preempt() which can decide to switch out the > current thread and switch to the new one immediatly. > The trouble with that is that we have already put the new one on the > ksegrp's run queue! When that thread is next put on the run queue using > setrunqueue() it is already there, and we end up with an infinitly looping > run queue. Any code that follows that list will never end. and the system > will freeze. > > Here is a patch that solves it but I'm not happy about it.. > John, you wrote the preemption code.. > do you have any ideas about how to do this cleaner? > > One possibility is to make sched_add return a value that indicates if the > thread was handled immediatly. that would allow setrunqueue to only set it > into the ksegrp's run queue if it was not already handled. > > Other suggestions welcome. I think it's probably a good idea to do the preemption check before putting the thread on the kse group. However, that might break ULE and some things it does (ULE pins interrupt threads but does it in sched_add, perhaps that is a hack and the pinning should be done in ithread_schedule instead). Changing sched_add() to return a boolean similar to maybe_preempt() is probably ok as an alternative then. Also, there's really no need for an additional SRQ_NOPREEMPT flag, that just duplicates critical_enter()/critical_exit(). The same is probably true of SRQ_YIELDING and SRQ_MYSELF (preemption already doesn't preempt to curthread since the priorities are equal). The place that uses SRQ_YIELDING can just add a critical section around the call to setrunqueue(). Note that when a preemption is deferred due to a nested critical section, the preemption doesn't actually occur until the outermost critical section is exited, so if you do this: mtx_lock_spin(&sched_lock); blah blah; if (foo) { critical_enter(); setrunqueue(td2); critical_exit(); mi_switch(NULL, SW_VOL); } mtx_unlock_spin(&sched_lock); That won't actually preempt. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org