From owner-freebsd-arch@FreeBSD.ORG  Tue Dec 14 12:57:15 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D58CC106564A
	for <arch@freebsd.org>; Tue, 14 Dec 2010 12:57:15 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 963B68FC15
	for <arch@freebsd.org>; Tue, 14 Dec 2010 12:57:15 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 21CB046B09;
	Tue, 14 Dec 2010 07:57:15 -0500 (EST)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id C1F7E8A009;
	Tue, 14 Dec 2010 07:57:13 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: Sergey Babkin <babkin@verizon.net>
Date: Tue, 14 Dec 2010 07:50:58 -0500
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; )
References: <201012101050.45214.jhb@freebsd.org>
	<201012130927.26815.jhb@freebsd.org>
	<4D06BC5D.E573E3F1@verizon.net>
In-Reply-To: <4D06BC5D.E573E3F1@verizon.net>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="koi8-r"
Content-Transfer-Encoding: 7bit
Message-Id: <201012140750.58712.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Tue, 14 Dec 2010 07:57:13 -0500 (EST)
X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham
	version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx
Cc: arch@freebsd.org
Subject: Re: Realtime thread priorities
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Dec 2010 12:57:16 -0000

On Monday, December 13, 2010 7:37:49 pm Sergey Babkin wrote:
> John Baldwin wrote:
> > 
> > On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote:
> > > John Baldwin wrote:
> > > >
> > > > The current layout breaks up the global thread priority space (0 - 255)
> > into a
> > > > couple of bands:
> > > >
> > > >   0 -  63 : interrupt threads
> > > >  64 - 127 : kernel sleep priorities (PSOCK, etc.)
> > > > 128 - 159 : real-time user threads (rtprio)
> > > > 160 - 223 : time-sharing user threads
> > > > 224 - 255 : idle threads (idprio and kernel idle procs)
> > > >
> > > > If we decide to change the behavior I see two possible fixes:
> > > >
> > > > 1) (easy) just move the real-time priority range above the kernel sleep
> > > > priority range
> > >
> > > Would not this cause a priority inversion when an RT process
> > > enters the kernel mode?
> > 
> > How so?  Note that timesharing threads are not "bumped" to a kernel sleep
> > priority when they enter the kernel either.  The kernel sleep priorities are
> > purely a way for certain sleep channels to cause a thread to be treated as
> > interactive and give it a priority boost to favor interactive threads.
> > Threads in the kernel do not automatically have higher priority than threads
> > not in the kernel.  Keep in mind that all stopped threads (threads not
> > executing) are always in the kernel when they stop.
> 
> I may be a bit behind the times here. But historically the "default"
> process priority means the priority when the process was pre-empted.
> If it did a system call, the priority on wake up would be as
> specified in the sleep() kernel function (or its more modern
> analog, like a sleeplock or condition variable). This would 
> let the kernel code react quickly, and then on return from 
> the syscall revert to the original priority, and possibly 
> get pre-empted by another process at that time.

Except we don't do an explicit check in userret() to see if we should preempt
when we drop the priority.  We effectively let the process/thread run at the
higher "sleep" priority until either 1) it's quantum expires, or 2) an
interrupt causes a preemption due to some other higher priority thread being
scheduled.  However, if a higher priority thread is already on the run queue
when we return to userland, it will not be preempted to.  That is what the 2)
suggestion in the original e-mail was about.

> If the user-mode priority is higher than the kernel-mode priority,
> this would mean that once a high priority process does a system
> call (say for example, poll()), it would experience a priority
> inversion and sleep with a lower priority than specified.

That's what this part of the patch for 1) is about:

Index: kern/kern_synch.c
===================================================================
--- kern/kern_synch.c   (revision 215592)
+++ kern/kern_synch.c   (working copy)
@@ -214,7 +214,8 @@
         * Adjust this thread's priority, if necessary.
         */
        pri = priority & PRIMASK;
-       if (pri != 0 && pri != td->td_priority) {
+       if (pri != 0 && pri != td->td_priority &&
+           td->td_pri_class == PRI_TIMESHARE) {
                thread_lock(td);
                sched_prio(td, pri);
                thread_unlock(td);

This avoids the priority inversion.  It also avoids giving a bump to an
'idprio' thread.  Note that if any thread holds a mutex or rwlock that a
higher priority thread needs, we lend the priority to the lock holder while
the mutex is held and we will preempt to the higher priority thread when the
mutex is released.

-- 
John Baldwin