From owner-freebsd-arch@FreeBSD.ORG Fri Dec 10 15:52:10 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A1A6106566C for ; Fri, 10 Dec 2010 15:52:10 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0E9098FC15 for ; Fri, 10 Dec 2010 15:52:10 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id B50FC46B58 for ; Fri, 10 Dec 2010 10:52:09 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 36AF48A027 for ; Fri, 10 Dec 2010 10:52:08 -0500 (EST) From: John Baldwin To: arch@freebsd.org Date: Fri, 10 Dec 2010 10:50:45 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201012101050.45214.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Fri, 10 Dec 2010 10:52:08 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=0.7 required=4.2 tests=BAYES_00,TO_NO_BRKTS_DIRECT autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: Subject: Realtime thread priorities X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Dec 2010 15:52:10 -0000 So I finally had a case today where I wanted to use rtprio but it doesn't seem very useful in its current state. Specifically, I want to be able to tag certain user processes as being more important than any other user processes even to the point that if one of my important processes blocks on a mutex, the owner of that mutex should be more important than sshd being woken up from sbwait by new data (for example). This doesn't work currently with rtprio due to the way the priorities are laid out (and I believe I probably argued for the current layout back when it was proposed). The current layout breaks up the global thread priority space (0 - 255) into a couple of bands: 0 - 63 : interrupt threads 64 - 127 : kernel sleep priorities (PSOCK, etc.) 128 - 159 : real-time user threads (rtprio) 160 - 223 : time-sharing user threads 224 - 255 : idle threads (idprio and kernel idle procs) The problem I am running into is that when a time-sharing thread goes to sleep in the kernel (waiting on select, socket data, tty, etc.) it actually ends up in the kernel priorities range (64 - 127). This means when it wakes up it will trump (and preempt) a real-time user thread even though these processes nominally have a priority down in the 160 - 223 range. We do drop the kernel sleep priority during userret(), but we don't recheck the scheduler queues to see if we should preempt the thread during userret(), so it effectively runs with the kernel sleep priority for the rest of the quantum while it is in userland. My first question is if this behavior is the desired behavior? Originally I think I preferred the current layout because I thought a thread in the kernel should always have priority so it can release locks, etc. However, priority propagation should actually handle the case of some very important thread needing a lock. In my use case today where I actually want to use rtprio I think I want different behavior where the rtprio thread is more important than the thread waking up with PSOCK, etc. If we decide to change the behavior I see two possible fixes: 1) (easy) just move the real-time priority range above the kernel sleep priority range 2) (harder) make sched_userret() check the run queue to see if it should preempt when dropping the kernel sleep priority. I think bde@ has suggested that we should do this for correctness previously (and I've had some old, unfinished patches to do this in a branch in p4 for several years). -- John Baldwin