From owner-freebsd-arch@FreeBSD.ORG Wed Dec 15 16:56:55 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5E62106564A for ; Wed, 15 Dec 2010 16:56:55 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 9C1E88FC0C for ; Wed, 15 Dec 2010 16:56:55 +0000 (UTC) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.4/8.14.4/NETPLEX) with ESMTP id oBFGk2Af022848; Wed, 15 Dec 2010 11:46:02 -0500 X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.2.6 (mail.netplex.net [204.213.176.10]); Wed, 15 Dec 2010 11:46:02 -0500 (EST) Date: Wed, 15 Dec 2010 11:46:02 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John Baldwin In-Reply-To: <201012150938.44217.jhb@freebsd.org> Message-ID: References: <201012101050.45214.jhb@freebsd.org> <201012140756.52926.jhb@freebsd.org> <4D081C7C.5040407@freebsd.org> <201012150938.44217.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Dec 2010 16:56:56 -0000 On Wed, 15 Dec 2010, John Baldwin wrote: > > Put another way, the time-sharing thread that I don't care about (sshd, or > some other monitoring daemon, etc.) is stealing a resource I care about > (time, in the form of CPU cycles) from my RT process that is critical to > getting my work done. > > Beyond that a few more points: > > - You are ignoring "tools, not policy". You don't know what is in my binary > (and I can't really tell you). Assume for a minute that I'm not completely > dumb and can write userland code that is safe to run at this high of a > priority level. You already trust me to write code in the kernel that runs > at even higher priority now. :) > - You repeatedly keep missing (ignoring?) the fact that this is _optional_. > Users have to intentionally decide to enable this, and there are users who > do _need_ this functionality. > - You have also missed that this has always been true for idprio processes > (and is in fact why we restrict idprio to root), so this is not "new". > - Finally, you also are missing that this can already happen _now_ for plain > old time sharing processes if the thread holding the resource doesn't ever > do a sleep that raises the priority. > > For example, if a time-sharing thread with some typical priority >= > PRI_MIN_TIMESHARE calls write(2) on a file, it can lock the vnode lock for > that file (if it is unlocked) and hold that lock while it's priority is >= > PRI_MIN_TIMESHARE. If an interrupt arrives for a network packet that wakes > up sshd for a new SSH connection, the interrupt thread will preempt the > thread holding the vnode lock, and sshd will be executed instead of the > thread holding the vnode lock when the ithread finishes. If sshd needs the > vnode lock that the original thread holds, then sshd will block until the > original thread is rescheduled due to the random fates of time and releases > the vnode lock. > > In summary, the kernel sleep priorities do _not_ serve to prevent all > priority inversions, what they do accomplish is giving preferential treatment > to idle, "interactive" threads. > > A bit more information on my use case btw: > > My RT processes are each assigned a _dedicated_ CPU via cpuset (we remove the > CPU from the global cpuset and ensure no interrupts are routed to that CPU). > The problem I have is that if my RT process blocks on a lock (e.g. a lock on a > VM object during a page fault), then I want the RT thread to lend its RT > priority to the thread that holds the lock over on another CPU so that the lock > can be released as quickly as possible. This use case is perfectly safe (the > RT thread is not preempting other threads, instead other threads are partitioned > off into a separate set of available CPUs). What I need is to ensure that the > syncer or pagedaemon or whoever holds the lock I need gets a chance to run right > away when it holds a lock that I need. And speaking as a developer that writes applications that require real-time priorities, all of the above is a good summary. As such a developer, I don't use real-time priorities to make applications run faster, have more throughput, get more work done, or anything like that. It is to attempt to meet real world deadlines. Our applications do not busy the CPU, they block mostly, waking up for and handling events - both periodic and aperiodic. We know our applications run real-time, so we try to be as efficient as possible. If there is something more CPU intensive, then perhaps we'll have another lower priority thread/process to handle that task. The important thing is that we need to meet or respond to a time- critical event. We do expect that our real-time threads will run over time sharing or other lower priority threads, and that priority will be propagated for any contested OS locks. In our situation, it is acceptable to starve low priority tasks, though we do design the applications to avoid that. -- DE