From owner-cvs-all@FreeBSD.ORG Mon Jun 18 17:31:59 2007 Return-Path: X-Original-To: cvs-all@freebsd.org Delivered-To: cvs-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5563516A47D; Mon, 18 Jun 2007 17:31:59 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.freebsd.org (Postfix) with ESMTP id 77B8213C487; Mon, 18 Jun 2007 17:31:58 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.8/8.13.8) with ESMTP id l5IHVlDV064463; Mon, 18 Jun 2007 13:31:47 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: Bruce Evans Date: Mon, 18 Jun 2007 13:20:26 -0400 User-Agent: KMail/1.9.6 References: <200706051420.l55EKEih018925@repoman.freebsd.org> <200706151709.59898.jhb@freebsd.org> <20070617165540.V22900@besplex.bde.org> In-Reply-To: <20070617165540.V22900@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200706181320.27338.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Mon, 18 Jun 2007 13:31:48 -0400 (EDT) X-Virus-Scanned: ClamAV 0.88.3/3454/Mon Jun 18 02:25:23 2007 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: src-committers@freebsd.org, Kip Macy , cvs-all@freebsd.org, Attilio Rao , cvs-src@freebsd.org, Kostik Belousov , Jeff Roberson Subject: Re: cvs commit: src/sys/kern kern_mutex.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Jun 2007 17:31:59 -0000 On Sunday 17 June 2007 03:40:12 am Bruce Evans wrote: > On Fri, 15 Jun 2007, John Baldwin wrote: > > > On Friday 15 June 2007 04:46:30 pm Bruce Evans wrote: > >> On Mon, 11 Jun 2007, John Baldwin wrote: > > >>> As to why preemption doesn't work for SMP, a thread only knows to preempt > > if > >>> it makes a higher priority thread runnable. This happens in mtx_unlock > > when > >>> we wakeup a thread waiting on the lock, in wakeup, or when an interrupt > >>> thread is scheduled (the interrupted thread "sees" the ithread being > >>> scheduled). If another thread on another CPU makes a thread runnable, the > >>> thread on the first CPU has no idea unless the second CPU explicitly sends > > a > >>> message (i.e. IPI) to the first CPU asking it to yield instead. > >> > >> I believe SCHED_ULE does the IPI. > > > > If you add 'options IPI_PREEMPTION' I think the IPI is enabled in 4BSD. > > I added this about 6 months ago, but it didn't help then and still doesn't, > at least for SCHED_4BSD and relative to voluntarily yielding in the > PREEMPTION case. Talk to ups@ about that then, as he did the IPI_PREEMPTION stuff. > > You probably need more details (KTR is good) to see exactly when threads are > > becoming runnable (and on which CPUs) and when the kernel is preempting to > > see what it is going on and where the context switches come from. KTR_SCHED > > + schedgraph.py may prove useful. > > Reading the code is easier :-). I noticed the following problems: > - the flag is neither set nor checked in the !PREEMPTION case, except for > SCHED_ULE it is set. Most settings and checkings of it are under the > PREEMPTION ifdef. I think this is wrong -- preemption to ithreads should > occur even without PREEMPTION (so there would be 3 levels of PREEMPTION > -- none (as given by !PREEMPTION now), preemption to ithreads only (not > available now?), and whatver FULL_PREEMPTION gives). The exception is > that sched_add() for SCHED_ULE calls sched_preempt() in the non-yielding > case, and setting the flag in sched_preempt() isn't under the PREEMPTION > ifdef. But this is moot since SCHED_ULE requires PREEMPTION. The idea of PREEMPTION is that all preemption is inside the scheduler and there aren't explicit preemptions like in the mutex code or ithread code as the kernel can make a decision when a thread is scheduled. Thus, there is no ithread preemption w/o PREEMPTION. > - the condition for being an idle priority thread is wrong. It is affected > by much the same translation problems as pri_level in userland. An idle > thread may have a borrowed priority, so it is impossible to classify > idle threads according to theire current priority alone. This may allow > setting of the preemption flag to never be done even in the PREEMPTION > case, as follows: > = idle priority thread is running with a borrowed non-idle priority > - and enters a critical section > - maybe_preempt() is called but of course doesn't preempt because of the > critical section > - and also doesn't set the flag due to the misclassified priority. This > apparently happens often for pagezero, due to it holding a mutex most > of the time that it is running. (Without your proposed change, it > isn't in a critical section, but I suspect maybe_preempt() doesn't > preempt it for similar reasons.) Err, it uses the real priority of the thread, so if the page zero thread has inherited a priority it gets treated as as a non-idle thread until it releases the mutex and resumes its idle thread priority. When it releases the lock it will lower its priority and then preempt to the thread waiting for the lock. The scheduler will then run through any higher priority tasks before it gets back to the page zero thread. > - when the thread leaves the critical section, preemption doesn't occur > because the flag is not set, and preemption isn't reconsidered because > critical_exit() doesn't do that. > - when the priority is unborrowed, premption should be reconsidered. I > don't know if it is. See turnstile_unpend() and turnstile_disown(). > >> Maybe preemption should be inhibited a bit when any mutex is held. > > > > That would make mutexes spinlocks that block interrupts. Would sort of defeat > > the point of having mutexes that aren't spinlocks. > > I mean that preemption should only be inhibited, not completely blocked. > Something like delaying preemption for a couple of microseconds would be > good, but would be hard to implement since the obvious implementation, > of scheduling an interrupt after a couple of microseconds to cause > reconsideration of the preemption decision, would be too expensive. For > well-behaved threads like pagezero, we can be sure that a suitable > preemption reconsideration point like critical_exit() is called within > a couple of microseconds. That is essentially how the voluntarily > yielding in pagezero works now. Hmm, that could be beneficial. Similar to how adaptive spinning optimizes for short holds. -- John Baldwin