From owner-freebsd-arch@FreeBSD.ORG  Mon Sep 20 20:48:51 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D154F16A4CE
	for <freebsd-arch@FreeBSD.org>; Mon, 20 Sep 2004 20:48:51 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id 6094943D1D
	for <freebsd-arch@FreeBSD.org>; Mon, 20 Sep 2004 20:48:51 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 20815 invoked by uid 89); 20 Sep 2004 20:48:49 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 20 Sep 2004 20:48:49 -0000
Received: (qmail 20800 invoked by uid 89); 20 Sep 2004 20:48:49 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 20 Sep 2004 20:48:49 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8KKmlmt054175;
	Mon, 20 Sep 2004 16:48:47 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <200409201442.04525.jhb@FreeBSD.org>
References: <1095468747.31297.241.camel@palm.tree.com>
	 <200409181653.35242.jhb@FreeBSD.org>
	 <1095548914.43781.27.camel@palm.tree.com>
	 <200409201442.04525.jhb@FreeBSD.org>
Content-Type: text/plain
Message-Id: <1095713326.53798.71.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Mon, 20 Sep 2004 16:48:47 -0400
Content-Transfer-Encoding: 7bit
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Sep 2004 20:48:52 -0000

On Mon, 2004-09-20 at 14:42, John Baldwin wrote:
> On Saturday 18 September 2004 07:08 pm, Stephan Uphoff wrote:
> > On Sat, 2004-09-18 at 16:53, John Baldwin wrote:
> > > On Saturday 18 September 2004 01:42 pm, Stephan Uphoff wrote:
> > > > On Fri, 2004-09-17 at 21:20, Julian Elischer wrote:
> > > > > Stephan Uphoff wrote:
> > > > > >If this is true kernel threads can be preempted while holding
> > > > > >for example the root vnode lock (or other important kernel
> > > > > >resources) while not getting a chance to run until there are no more
> > > > > >user processes with better priority.
> > > > >
> > > > > This is also true,  though it is a slightly more complicated thing
> > > > > than that.
> > > > > Preempting threads are usually interrupt threads and are thus usually
> > > > > short lived,.
> > > >
> > > > But interrupt threads often wake up other threads ...
> > >
> > > That are lower priority and thus won't be preempted to.  Instead, they
> > > run when the interrupt thread goes back to sleep after it finishes.
> >
> > Lower priority than the interrupt threads.
> > They can however have a priority better than the interrupted thread
> > holding the kernel resource.
> > In this case the newly awoken threads will be next to run.
> > If they are compute bound in user space or wake other threads with
> > better priorities it might take a while until the system switches back
> > to the interrupted thread.
> 
> Yes, but that is what the system is supposed to do.  If you want the 
> interrupted thread to run sooner because it holds a resource, then you need 
> to adjust its priority when it holds the resource somehow.  We do this with 
> mutexes by having a blocking thread lend its priority to the owner of the 
> mutex.

Adjusting the priority based on resource ownership would be very
difficult to implement.

Something like:
	s = raise_priority(new_priority);
	... hold and release kernel resource
	restore_priority(s);
will not work as the acquisition/release of different resources overlap.
( Example vnode lock crabbing)

The alternative would be tracking of the ownership of resources as in
the mutex case.
Unfortunately in the most cases this can not be done automatically
and would require major efforts. (Verifying the code neighborhoods of
all  lockmgr(9), sema(9), condvar(9), sx(9), msleep(9) .. users) It
would probably also bloat the code.

A simple alternative would be to require that a threads priority is at
least PRI_MAX_KERN (or better) while holding kernel resources.
This could be accomplished my adjusting the priority on trap entries to
the kernel before systemcalls,the page fault handling,... is done.
( And modifying uio_yield()) 
While this can not eliminate all priority inversions it would sharply
reduce their duration.
This is why I expected the priority adjustment on kernel entry and asked
for help when I could not find it.

	Stephan