From owner-freebsd-arch@FreeBSD.ORG  Wed Dec 15 16:56:55 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E5E62106564A
	for <arch@freebsd.org>; Wed, 15 Dec 2010 16:56:55 +0000 (UTC)
	(envelope-from deischen@freebsd.org)
Received: from mail.netplex.net (mail.netplex.net [204.213.176.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 9C1E88FC0C
	for <arch@freebsd.org>; Wed, 15 Dec 2010 16:56:55 +0000 (UTC)
Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11])
	by mail.netplex.net (8.14.4/8.14.4/NETPLEX) with ESMTP id
	oBFGk2Af022848; Wed, 15 Dec 2010 11:46:02 -0500
X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net)
X-Greylist: Message whitelisted by DRAC access database, not delayed by
	milter-greylist-4.2.6 (mail.netplex.net [204.213.176.10]);
	Wed, 15 Dec 2010 11:46:02 -0500 (EST)
Date: Wed, 15 Dec 2010 11:46:02 -0500 (EST)
From: Daniel Eischen <deischen@freebsd.org>
X-X-Sender: eischen@sea.ntplx.net
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <201012150938.44217.jhb@freebsd.org>
Message-ID: <Pine.GSO.4.64.1012151115350.27084@sea.ntplx.net>
References: <201012101050.45214.jhb@freebsd.org>
	<201012140756.52926.jhb@freebsd.org>
	<4D081C7C.5040407@freebsd.org> <201012150938.44217.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org
Subject: Re: Realtime thread priorities
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Daniel Eischen <deischen@freebsd.org>
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Dec 2010 16:56:56 -0000

On Wed, 15 Dec 2010, John Baldwin wrote:
>
> Put another way, the time-sharing thread that I don't care about (sshd, or
> some other monitoring daemon, etc.) is stealing a resource I care about
> (time, in the form of CPU cycles) from my RT process that is critical to
> getting my work done.
>
> Beyond that a few more points:
>
> - You are ignoring "tools, not policy".  You don't know what is in my binary
>  (and I can't really tell you).  Assume for a minute that I'm not completely
>  dumb and can write userland code that is safe to run at this high of a
>  priority level.  You already trust me to write code in the kernel that runs
>  at even higher priority now. :)
> - You repeatedly keep missing (ignoring?) the fact that this is _optional_.
>  Users have to intentionally decide to enable this, and there are users who
>  do _need_ this functionality.
> - You have also missed that this has always been true for idprio processes
>  (and is in fact why we restrict idprio to root), so this is not "new".
> - Finally, you also are missing that this can already happen _now_ for plain
>  old time sharing processes if the thread holding the resource doesn't ever
>  do a sleep that raises the priority.
>
> For example, if a time-sharing thread with some typical priority >=
> PRI_MIN_TIMESHARE calls write(2) on a file, it can lock the vnode lock for
> that file (if it is unlocked) and hold that lock while it's priority is >=
> PRI_MIN_TIMESHARE.  If an interrupt arrives for a network packet that wakes
> up sshd for a new SSH connection, the interrupt thread will preempt the
> thread holding the vnode lock, and sshd will be executed instead of the
> thread holding the vnode lock when the ithread finishes.  If sshd needs the
> vnode lock that the original thread holds, then sshd will block until the
> original thread is rescheduled due to the random fates of time and releases
> the vnode lock.
>
> In summary, the kernel sleep priorities do _not_ serve to prevent all
> priority inversions, what they do accomplish is giving preferential treatment
> to idle, "interactive" threads.
>
> A bit more information on my use case btw:
>
> My RT processes are each assigned a _dedicated_ CPU via cpuset (we remove the
> CPU from the global cpuset and ensure no interrupts are routed to that CPU).
> The problem I have is that if my RT process blocks on a lock (e.g. a lock on a
> VM object during a page fault), then I want the RT thread to lend its RT
> priority to the thread that holds the lock over on another CPU so that the lock
> can be released as quickly as possible.  This use case is perfectly safe (the
> RT thread is not preempting other threads, instead other threads are partitioned
> off into a separate set of available CPUs).  What I need is to ensure that the
> syncer or pagedaemon or whoever holds the lock I need gets a chance to run right
> away when it holds a lock that I need.

And speaking as a developer that writes applications that require
real-time priorities, all of the above is a good summary.  As such
a developer, I don't use real-time priorities to make applications
run faster, have more throughput, get more work done, or anything
like that.  It is to attempt to meet real world deadlines.  Our
applications do not busy the CPU, they block mostly, waking up for
and handling events - both periodic and aperiodic.  We know our
applications run real-time, so we try to be as efficient as possible.
If there is something more CPU intensive, then perhaps we'll have
another lower priority thread/process to handle that task.  The
important thing is that we need to meet or respond to a time-
critical event.

We do expect that our real-time threads will run over time
sharing or other lower priority threads, and that priority
will be propagated for any contested OS locks.  In our situation,
it is acceptable to starve low priority tasks, though we do
design the applications to avoid that.

-- 
DE