Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Dec 2010 16:41:51 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        arch@freebsd.org
Subject:   Re: Realtime thread priorities
Message-ID:  <201012101641.51652.jhb@freebsd.org>
In-Reply-To: <20101210195716.GE33073@deviant.kiev.zoral.com.ua>
References:  <201012101050.45214.jhb@freebsd.org> <201012101133.55389.jhb@freebsd.org> <20101210195716.GE33073@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, December 10, 2010 2:57:16 pm Kostik Belousov wrote:
> On Fri, Dec 10, 2010 at 11:33:55AM -0500, John Baldwin wrote:
> > On Friday, December 10, 2010 11:26:31 am Kostik Belousov wrote:
> > > On Fri, Dec 10, 2010 at 10:50:45AM -0500, John Baldwin wrote:
> > > > So I finally had a case today where I wanted to use rtprio but it doesn't seem 
> > > > very useful in its current state.  Specifically, I want to be able to tag 
> > > > certain user processes as being more important than any other user processes 
> > > > even to the point that if one of my important processes blocks on a mutex, the 
> > > > owner of that mutex should be more important than sshd being woken up from 
> > > > sbwait by new data (for example).  This doesn't work currently with rtprio due 
> > > > to the way the priorities are laid out (and I believe I probably argued for 
> > > > the current layout back when it was proposed).
> > > > 
> > > > The current layout breaks up the global thread priority space (0 - 255) into a 
> > > > couple of bands:
> > > > 
> > > >   0 -  63 : interrupt threads
> > > >  64 - 127 : kernel sleep priorities (PSOCK, etc.)
> > > > 128 - 159 : real-time user threads (rtprio)
> > > > 160 - 223 : time-sharing user threads
> > > > 224 - 255 : idle threads (idprio and kernel idle procs)
> > > > 
> > > > The problem I am running into is that when a time-sharing thread goes to sleep 
> > > > in the kernel (waiting on select, socket data, tty, etc.) it actually ends up 
> > > > in the kernel priorities range (64 - 127).  This means when it wakes up it 
> > > > will trump (and preempt) a real-time user thread even though these processes 
> > > > nominally have a priority down in the 160 - 223 range.  We do drop the kernel 
> > > > sleep priority during userret(), but we don't recheck the scheduler queues to 
> > > > see if we should preempt the thread during userret(), so it effectively runs 
> > > > with the kernel sleep priority for the rest of the quantum while it is in 
> > > > userland.
> > > > 
> > > > My first question is if this behavior is the desired behavior?  Originally I 
> > > > think I preferred the current layout because I thought a thread in the kernel 
> > > > should always have priority so it can release locks, etc.  However, priority 
> > > > propagation should actually handle the case of some very important thread 
> > > > needing a lock.  In my use case today where I actually want to use rtprio I 
> > > > think I want different behavior where the rtprio thread is more important than 
> > > > the thread waking up with PSOCK, etc.
> > > > 
> > > > If we decide to change the behavior I see two possible fixes:
> > > > 
> > > > 1) (easy) just move the real-time priority range above the kernel sleep 
> > > > priority range
> > > > 
> > > > 2) (harder) make sched_userret() check the run queue to see if it should 
> > > > preempt when dropping the kernel sleep priority.  I think bde@ has suggested 
> > > > that we should do this for correctness previously (and I've had some old, 
> > > > unfinished patches to do this in a branch in p4 for several years).
> > > 
> > > Would not doing #2 allow e.g. two threads that perform ping-pong with
> > > a single byte read/write into a socket to usurp the CPU ? The threads
> > > could try to also do some CPU-intensive calculations for some time
> > > during the quantum too.
> > > 
> > > Such threads are arguably "interactive", but I think that the gain is
> > > priority is too unfair.
> > 
> > Err, I think that what you describe is the current case and is what #2 would
> > seek to change.
> 
> Sorry, might be my language was not clear, but I said "Would not doing
> #2 allow ...", i.e. I specifically mean that we shall do #2 to avoid the
> situation I described.

Ah, yes, it does allow that.  As bde@ said though, the overhead of extra
context switches in the common case might not be worth it.

I have a possible patch for 1), but it involves fixing a few places and is
only compile tested yet (will run test it soon).  I also think that in my
case I almost always want 1) anyway (my realtime processes are always more
important than sshd, even while sshd is in the kernel):

Index: kern/kern_synch.c
===================================================================
--- kern/kern_synch.c	(revision 215592)
+++ kern/kern_synch.c	(working copy)
@@ -214,7 +214,8 @@
 	 * Adjust this thread's priority, if necessary.
 	 */
 	pri = priority & PRIMASK;
-	if (pri != 0 && pri != td->td_priority) {
+	if (pri != 0 && pri != td->td_priority &&
+	    td->td_pri_class == PRI_TIMESHARE) {
 		thread_lock(td);
 		sched_prio(td, pri);
 		thread_unlock(td);
@@ -552,7 +553,8 @@
 {
 
 	thread_lock(td);
-	sched_prio(td, PRI_MAX_TIMESHARE);
+	if (td->td_pri_class == PRI_TIMESHARE)
+		sched_prio(td, PRI_MAX_TIMESHARE);
 	mi_switch(SW_VOL, NULL);
 	thread_unlock(td);
 	td->td_retval[0] = 0;
Index: kern/subr_sleepqueue.c
===================================================================
--- kern/subr_sleepqueue.c	(revision 215592)
+++ kern/subr_sleepqueue.c	(working copy)
@@ -693,7 +720,8 @@
 
 	/* Adjust priority if requested. */
 	MPASS(pri == -1 || (pri >= PRI_MIN && pri <= PRI_MAX));
-	if (pri != -1 && td->td_priority > pri)
+	if (pri != -1 && td->td_priority > pri &&
+	    td->td_pri_class == PRI_TIMESHARE)
 		sched_prio(td, pri);
 	return (setrunnable(td));
 }
Index: sys/priority.h
===================================================================
--- sys/priority.h	(revision 215592)
+++ sys/priority.h	(working copy)
@@ -68,8 +68,8 @@
  * are insignificant.  Ranges are as follows:
  *
  * Interrupt threads:		0 - 63
- * Top half kernel threads:	64 - 127
- * Realtime user threads:	128 - 159
+ * Realtime user threads:	64 - 95
+ * Top half kernel threads:	96 - 159
  * Time sharing user threads:	160 - 223
  * Idle user threads:		224 - 255
  *
@@ -81,7 +81,7 @@
 #define	PRI_MAX			(255)		/* Lowest priority. */
 
 #define	PRI_MIN_ITHD		(PRI_MIN)
-#define	PRI_MAX_ITHD		(PRI_MIN_KERN - 1)
+#define	PRI_MAX_ITHD		(PRI_MIN_REALTIME - 1)
 
 #define	PI_REALTIME		(PRI_MIN_ITHD + 0)
 #define	PI_AV			(PRI_MIN_ITHD + 4)
@@ -94,9 +94,12 @@
 #define	PI_DULL			(PRI_MIN_ITHD + 32)
 #define	PI_SOFT			(PRI_MIN_ITHD + 36)
 
-#define	PRI_MIN_KERN		(64)
-#define	PRI_MAX_KERN		(PRI_MIN_REALTIME - 1)
+#define	PRI_MIN_REALTIME	(64)
+#define	PRI_MAX_REALTIME	(PRI_MIN_KERN - 1)
 
+#define	PRI_MIN_KERN		(96)
+#define	PRI_MAX_KERN		(PRI_MIN_TIMESHARE - 1)
+
 #define	PSWP			(PRI_MIN_KERN + 0)
 #define	PVM			(PRI_MIN_KERN + 4)
 #define	PINOD			(PRI_MIN_KERN + 8)
@@ -109,9 +112,6 @@
 #define	PLOCK			(PRI_MIN_KERN + 36)
 #define	PPAUSE			(PRI_MIN_KERN + 40)
 
-#define	PRI_MIN_REALTIME	(128)
-#define	PRI_MAX_REALTIME	(PRI_MIN_TIMESHARE - 1)
-
 #define	PRI_MIN_TIMESHARE	(160)
 #define	PRI_MAX_TIMESHARE	(PRI_MIN_IDLE - 1)
 

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201012101641.51652.jhb>