From owner-freebsd-arch@FreeBSD.ORG Tue Dec 14 12:57:15 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D58CC106564A for ; Tue, 14 Dec 2010 12:57:15 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 963B68FC15 for ; Tue, 14 Dec 2010 12:57:15 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 21CB046B09; Tue, 14 Dec 2010 07:57:15 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C1F7E8A009; Tue, 14 Dec 2010 07:57:13 -0500 (EST) From: John Baldwin To: Sergey Babkin Date: Tue, 14 Dec 2010 07:50:58 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <201012101050.45214.jhb@freebsd.org> <201012130927.26815.jhb@freebsd.org> <4D06BC5D.E573E3F1@verizon.net> In-Reply-To: <4D06BC5D.E573E3F1@verizon.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="koi8-r" Content-Transfer-Encoding: 7bit Message-Id: <201012140750.58712.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 14 Dec 2010 07:57:13 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: arch@freebsd.org Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Dec 2010 12:57:16 -0000 On Monday, December 13, 2010 7:37:49 pm Sergey Babkin wrote: > John Baldwin wrote: > > > > On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote: > > > John Baldwin wrote: > > > > > > > > The current layout breaks up the global thread priority space (0 - 255) > > into a > > > > couple of bands: > > > > > > > > 0 - 63 : interrupt threads > > > > 64 - 127 : kernel sleep priorities (PSOCK, etc.) > > > > 128 - 159 : real-time user threads (rtprio) > > > > 160 - 223 : time-sharing user threads > > > > 224 - 255 : idle threads (idprio and kernel idle procs) > > > > > > > > If we decide to change the behavior I see two possible fixes: > > > > > > > > 1) (easy) just move the real-time priority range above the kernel sleep > > > > priority range > > > > > > Would not this cause a priority inversion when an RT process > > > enters the kernel mode? > > > > How so? Note that timesharing threads are not "bumped" to a kernel sleep > > priority when they enter the kernel either. The kernel sleep priorities are > > purely a way for certain sleep channels to cause a thread to be treated as > > interactive and give it a priority boost to favor interactive threads. > > Threads in the kernel do not automatically have higher priority than threads > > not in the kernel. Keep in mind that all stopped threads (threads not > > executing) are always in the kernel when they stop. > > I may be a bit behind the times here. But historically the "default" > process priority means the priority when the process was pre-empted. > If it did a system call, the priority on wake up would be as > specified in the sleep() kernel function (or its more modern > analog, like a sleeplock or condition variable). This would > let the kernel code react quickly, and then on return from > the syscall revert to the original priority, and possibly > get pre-empted by another process at that time. Except we don't do an explicit check in userret() to see if we should preempt when we drop the priority. We effectively let the process/thread run at the higher "sleep" priority until either 1) it's quantum expires, or 2) an interrupt causes a preemption due to some other higher priority thread being scheduled. However, if a higher priority thread is already on the run queue when we return to userland, it will not be preempted to. That is what the 2) suggestion in the original e-mail was about. > If the user-mode priority is higher than the kernel-mode priority, > this would mean that once a high priority process does a system > call (say for example, poll()), it would experience a priority > inversion and sleep with a lower priority than specified. That's what this part of the patch for 1) is about: Index: kern/kern_synch.c =================================================================== --- kern/kern_synch.c (revision 215592) +++ kern/kern_synch.c (working copy) @@ -214,7 +214,8 @@ * Adjust this thread's priority, if necessary. */ pri = priority & PRIMASK; - if (pri != 0 && pri != td->td_priority) { + if (pri != 0 && pri != td->td_priority && + td->td_pri_class == PRI_TIMESHARE) { thread_lock(td); sched_prio(td, pri); thread_unlock(td); This avoids the priority inversion. It also avoids giving a bump to an 'idprio' thread. Note that if any thread holds a mutex or rwlock that a higher priority thread needs, we lend the priority to the lock holder while the mutex is held and we will preempt to the higher priority thread when the mutex is released. -- John Baldwin