From owner-freebsd-arch@FreeBSD.ORG Thu Sep 30 17:35:36 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 40F8516A4CE for ; Thu, 30 Sep 2004 17:35:36 +0000 (GMT) Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0A0A443D49 for ; Thu, 30 Sep 2004 17:35:34 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 26464 invoked from network); 30 Sep 2004 17:35:33 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 30 Sep 2004 17:35:32 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8UHZREi019616; Thu, 30 Sep 2004 13:35:27 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Thu, 30 Sep 2004 10:17:54 -0400 User-Agent: KMail/1.6.2 References: <1095468747.31297.241.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> In-Reply-To: <1096496057.3733.2163.camel@palm.tree.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200409301017.54350.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: Peter Holm cc: "freebsd-arch@freebsd.org" cc: Julian Elischer cc: Stephan Uphoff Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2004 17:35:36 -0000 On Wednesday 29 September 2004 06:14 pm, Stephan Uphoff wrote: > On Wed, 2004-09-29 at 16:52, John Baldwin wrote: > > > > OK - here is a crude patch to fix some problems with mutex priority > > > > inheritance. My theory is that the clock thread gets stuck waiting on > > > > GIANT. > > > > > > > > During release/acquisition of a contested sleep mutex there are a few > > > > windows where a task can be preempted when actions (waking up blocked > > > > threads, ownership of the mutex, ..) need to be atomic as far as > > > > scheduling is concerned. Otherwise priority inheritance may fail. The > > > > patch uses critical_enter/critical_exit to protect these regions > > > > against preemption. > > > > > > > > It would be great if could run this in addition to the other patches. > > > > turnstile_claim() doesn't make any threads runnable and thus can't > > preempt. The other place is supposed to preempt, and it should be ok to > > do so. Note that since the turnstile chain lock is held, that includes a > > nested critical section and any preemption will be deferred until the > > turnstile lock is released via turnstile_release which happens in the > > middle of > > turnstile_unpend() after it has finished building a list of all the > > threads to be made runnable so that the turnstile object can be re-used > > safely. I don't think this patch will make much of a difference (if > > any). Can you provide a description of a case where you think the > > priority inheritance can fail if turnstile_unpend() doesn't run in a > > nested critical section? > > This is a bit of a mind bender. > I hope you have some aspirins close by ;-) > > Thread A holds a mutex x contested by Thread B and has priority pri(A). > Thread B holds a mutex y. > There is a thread C with priority pri(C) with pri(C) < pri(A). > > Thread A is in the process of releasing x. > It removes thread B from the turnstile and holds a pointer to B in a > private list. > Thread A sets the owner of the turnstile to NULL and releases all spin > locks. ( mtx_unlock_spin(&tc->tc_lock); line 148) > This means interrupts are now enabled. > > An interrupt occurs (or is already pending) and the interrupt handler > puts the associated interrupt thread I on the run queue. > This causes a preemption from A to I. > The interrupt thread I tries to acquire mutex y owned by B and blocks. > I donates its priority to B - but inheritance stops at B. > The next thread with the best priority is C and the cpu switches to C. > However B needs A to run to make it to the run-queue. > > If y is GIANT and I is the clock thread C could run forever in userspace > without being interrupted. Fair enough. The right place to fix this is in turnstile_unpend() though I think. I have had these patches that try to "clump" setrunqueue's before preempting lying around (but not thoroughly tested yet) that might fix this as well but in the turnstile code itself: --- //depot/projects/smpng/sys/kern/kern_thread.c 2004/09/22 15:31:15 +++ //depot/user/jhb/preemption/kern/kern_thread.c 2004/09/22 16:59:47 @@ -954,6 +954,7 @@ p->p_suspcount++; TD_SET_SUSPENDED(td); TAILQ_INSERT_TAIL(&p->p_suspended, td, td_runq); +#if 0 /* * Hack: If we are suspending but are on the sleep queue * then we are in msleep or the cv equivalent. We @@ -962,6 +963,7 @@ */ if (TD_ON_SLEEPQ(td)) TD_SET_SLEEPING(td); +#endif } void @@ -988,9 +990,11 @@ mtx_assert(&sched_lock, MA_OWNED); PROC_LOCK_ASSERT(p, MA_OWNED); if (!P_SHOULDSTOP(p)) { + critical_enter(); while ((td = TAILQ_FIRST(&p->p_suspended))) { thread_unsuspend_one(td); } + critical_exit(); } else if ((P_SHOULDSTOP(p) == P_STOPPED_SINGLE) && (p->p_numthreads == p->p_suspcount)) { /* @@ -1025,9 +1029,11 @@ * to continue however as this is a bad place to stop. */ if ((p->p_numthreads != 1) && (!P_SHOULDSTOP(p))) { - while (( td = TAILQ_FIRST(&p->p_suspended))) { + critical_enter(); + while ((td = TAILQ_FIRST(&p->p_suspended))) { thread_unsuspend_one(td); } + critical_exit(); } mtx_unlock_spin(&sched_lock); } --- //depot/projects/smpng/sys/kern/subr_sleepqueue.c 2004/08/20 17:10:02 +++ //depot/user/jhb/preemption/kern/subr_sleepqueue.c 2004/09/10 21:36:10 @@ -400,9 +400,10 @@ * just return. */ if (td->td_sleepqueue != NULL) { - MPASS(!TD_ON_SLEEPQ(td)); mtx_unlock_spin(&sc->sc_lock); mtx_lock_spin(&sched_lock); + MPASS(!TD_ON_SLEEPQ(td)); + MPASS(!TD_IS_SLEEPING(td)); return; } @@ -709,11 +710,13 @@ sleepq_release(wchan); /* Resume all the threads on the temporary list. */ + critical_enter(); while (!TAILQ_EMPTY(&list)) { td = TAILQ_FIRST(&list); TAILQ_REMOVE(&list, td, td_slpq); sleepq_resume_thread(td, pri); } + critical_exit(); } /* --- //depot/projects/smpng/sys/kern/subr_turnstile.c 2004/09/03 14:14:21 +++ //depot/user/jhb/preemption/kern/subr_turnstile.c 2004/09/10 21:36:10 @@ -727,6 +726,7 @@ * in turnstile_wait(). Set a flag to force it to try to acquire * the lock again instead of blocking. */ + critical_enter(); while (!TAILQ_EMPTY(&pending_threads)) { td = TAILQ_FIRST(&pending_threads); TAILQ_REMOVE(&pending_threads, td, td_lockq); @@ -742,6 +742,7 @@ MPASS(TD_IS_RUNNING(td) || TD_ON_RUNQ(td)); } } + critical_exit(); mtx_unlock_spin(&sched_lock); } --- //depot/projects/smpng/sys/vm/vm_glue.c 2004/09/22 15:31:15 +++ //depot/user/jhb/preemption/vm/vm_glue.c 2004/09/22 16:59:47 @@ -753,6 +753,7 @@ vm_thread_swapin(td); PROC_LOCK(p); + critical_enter(); mtx_lock_spin(&sched_lock); p->p_sflag &= ~PS_SWAPPINGIN; p->p_sflag |= PS_INMEM; @@ -767,6 +768,7 @@ /* Allow other threads to swap p out now. */ --p->p_lock; + critical_exit(); } #endif /* NO_SWAPPING */ } I.e., you could just move the critical_enter() in subr_turnstile.c earlier so it is before the mtx_unlock_spin() of the turnstile chain lock. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org