From owner-freebsd-arch@FreeBSD.ORG Tue Dec 28 19:58:23 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 99C0C1065670 for ; Tue, 28 Dec 2010 19:58:23 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5C95D8FC08 for ; Tue, 28 Dec 2010 19:58:23 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id EF11F46B06 for ; Tue, 28 Dec 2010 14:58:22 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id E1D208A009 for ; Tue, 28 Dec 2010 14:58:21 -0500 (EST) From: John Baldwin To: freebsd-arch@freebsd.org Date: Tue, 28 Dec 2010 14:58:21 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <201012101050.45214.jhb@freebsd.org> In-Reply-To: <201012101050.45214.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201012281458.21413.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 28 Dec 2010 14:58:22 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Subject: Re: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Dec 2010 19:58:23 -0000 On Friday, December 10, 2010 10:50:45 am John Baldwin wrote: > So I finally had a case today where I wanted to use rtprio but it doesn't seem > very useful in its current state. Specifically, I want to be able to tag > certain user processes as being more important than any other user processes > even to the point that if one of my important processes blocks on a mutex, the > owner of that mutex should be more important than sshd being woken up from > sbwait by new data (for example). This doesn't work currently with rtprio due > to the way the priorities are laid out (and I believe I probably argued for > the current layout back when it was proposed). > > The current layout breaks up the global thread priority space (0 - 255) into a > couple of bands: > > 0 - 63 : interrupt threads > 64 - 127 : kernel sleep priorities (PSOCK, etc.) > 128 - 159 : real-time user threads (rtprio) > 160 - 223 : time-sharing user threads > 224 - 255 : idle threads (idprio and kernel idle procs) > > The problem I am running into is that when a time-sharing thread goes to sleep > in the kernel (waiting on select, socket data, tty, etc.) it actually ends up > in the kernel priorities range (64 - 127). This means when it wakes up it > will trump (and preempt) a real-time user thread even though these processes > nominally have a priority down in the 160 - 223 range. We do drop the kernel > sleep priority during userret(), but we don't recheck the scheduler queues to > see if we should preempt the thread during userret(), so it effectively runs > with the kernel sleep priority for the rest of the quantum while it is in > userland. > > My first question is if this behavior is the desired behavior? Originally I > think I preferred the current layout because I thought a thread in the kernel > should always have priority so it can release locks, etc. However, priority > propagation should actually handle the case of some very important thread > needing a lock. In my use case today where I actually want to use rtprio I > think I want different behavior where the rtprio thread is more important than > the thread waking up with PSOCK, etc. > > If we decide to change the behavior I see two possible fixes: > > 1) (easy) just move the real-time priority range above the kernel sleep > priority range I have forward-ported my original patch for 7 to 9 and fixed several other nits I ran into along the way. The updated patch is at http://www.freebsd.org/~jhb/patches/rtpri.patch I think it can probably be broken up into several pieces at least some of which should be non-controversial. :) This patch makes the following changes: - Give the USB kthreads lower priority in the range of software interrupt threads rather than hardware interrupt threads. - Retire some unused ithread priorities: PI_TTYHIGH, PI_TAPE, and PI_DISKLOW. While here, rename PI_TTYLOW to PI_TTY. Also, add a macro PI_SWI() that takes a SWI_* constant as an argument and returns the suitable thread priority. - In sched_yield(), only drop the priority of timeshare threads to PRI_MAX_TIMESHARE. Non-timeshare threads retain whatever priority they currently have. - Only apply a kernel sleep priority from tsleep() to timeshare threads. This is only relevant once realtime threads move to a new priority range to avoid penalizing realtime threads for sleeping. - Explicitly set a sane initial priority (of PVM) for kthreads. Right now new kthreads inherit whatever priority thread0 happens to have when they are created. Since kthreads can be created from threads other than thread0 this priority can be fairly random. In practice, I've seen many kthreads created with an initial priority that is a hardware interrupt thread priority due to thread0 being lent an ithread priority. - Add some helper macros to ULE to define the ranges used for interactive and non-interactive timeshare threads and fix some places that hardcoded assumptions about the location of the realtime priority range. - Add a new option (that should perhaps be on by default) for use in conjunction with moving realtime priorities ULE_INTERACTIVE_TIMESHARE. When this new option is in effect, ULE does not abuse realtime priorites for interactive timeshare threads. Instead, the timeshare range is split into two ranges, one for interactive threads and one for non-interactive threads. The non-interactive range is further divided into three ranges to add bands at the top and bottom for nice levels. Combined with the other changes, the net effect is that interactive threads will have the same priority they have now (i.e. a band of 32 priorities in between kernel sleep priorities and non-interactive timeshare priorities) and that non-interactive threads now have a slightly larger band of priorities (32 priorities in the "middle" instead of 24 with additional bands of 20 above and below for nice values). - Never boost the priority of a thread via tsleep() if the passed in priority is zero. Zero means "don't change the priority", but ULE was still giving a boost in certain cases. In practice I suspect this rarely, if ever, triggered. - Always apply the requested sleep priority to kthreads. Certain kernel processes such as pagedaemon, etc. rely on tsleep() to lower the priority of the kproc so that it is treated as a background task when it is idle. The static_boost code in ULE would never lower the priority due to a sleep, so once a kproc gained a higher priority via sleeping it would never be treated as a background task again. This is especially problematic in the case that a kthread starts off with an ithread priority as noted above. - Retire the PCONFIG kernel sleep priority. We do not need a new priority level for boot time config hooks. When PCONFIG was added, tsleep() did not support leaving the priority alone via 0, but now it does support that so use that instead. - Restore dropping the syncer kthread down to PPAUSE when it is idle. - Drop the flowtable cleaner kthread down to PPAUSE when it is idle. - Move the realtime priority range in between the interrupt thread and kernel sleep priority range. Currently there is a small bit of overlap between SWI_TQ and SWI_TQ_GIANT and 'rtprio 0'. I hope to eliminate this by retiring SWI_TQ_FAST once interrupt filters are in place as then SwI_TQ and SWI_TQ_GIANT can move up a slot. -- John Baldwin