From owner-freebsd-arch@FreeBSD.ORG Fri Mar 7 14:36:09 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 06F441065671 for ; Fri, 7 Mar 2008 14:36:09 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id E8E228FC21 for ; Fri, 7 Mar 2008 14:36:08 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by elvis.mu.org (Postfix) with ESMTP id 74DCE1A4D8F; Fri, 7 Mar 2008 06:18:41 -0800 (PST) From: John Baldwin To: Jeff Roberson Date: Fri, 7 Mar 2008 08:42:37 -0500 User-Agent: KMail/1.9.7 References: <20080307020626.G920@desktop> In-Reply-To: <20080307020626.G920@desktop> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803070842.37248.jhb@freebsd.org> Cc: arch@freebsd.org Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Mar 2008 14:36:09 -0000 On Friday 07 March 2008 07:16:30 am Jeff Roberson wrote: > Hello, > > I've been studying some problems with recent scheduler improvements that > help a lot on some workloads and hurt on others. I've tracked the problem > down to static priority boosts handed out by msleep/cv_broadcastpri. The > basic problem is that a user thread will be woken with a kernel priority > thus allowing it to preempt a thread running on any processor with a > lesser priority. The lesser priority thread may in fact hold some > resource that the higher priority thread requires. Thus we context switch > several times and perhaps go through priority propagation as well. > > I have verified that disabling these static priority boosts entirely fixes > the performance problem I've run into on at least one workload. There are > probably others that it helps and hopefully we can discover that. > > I'd like to know if anyone has a strong preference to keep this feature. > It is likely that it helps in some interactive situations. I'm not sure > how much however. I propose that we make a sysctl that disables it and > turn it off by default. If we see complaints on current@ we can suggest > that they toggle the sysctl to see if it alleviates problems. > > Based on feedback from that experiment and some testing we can then choose > a few options: > > 1) Disable the static boosts entirely. Leave kernel priorities for > kernel threads and priority propagation. Most other kernels do this. > Would make my life in ULE much easier as well. > > 2) Leave the support for static boosts but remove it from all but a few > key locations. Leaving it in the api would give some flexibility but > might confuse developers. > > 3) Leave things as they are. undesirable. > > I'm leaning towards #2 based on the information I have presently. This is > almost a significant change to historic BSD behavior so we might want to > tread lightly. One thing to note is that we actually depend on the priority boost (evilly) to pick processes to swap out. (I think we check for <= PSOCK and don't swap those out). One thing that I've wanted to happen for a while is that the sleep priority for msleep() just be a parameter available to the scheduler that the scheduler can use to calculate the real internal priority rather than just being a set. That is, I imagine having: void sched_set_sleep_prio(struct thread *td, u_char pri); u_char sched_get_sleep_prio(struct thread *td); (The swap check would use the get call). The 4BSD scheduler's implementation of sched_set_sleep_prio would look like this: void sched_set_sleep_prio(struct thread *td, u_char pri) { td->td_sched->sleep_pri = pri; sched_prio(td, pri); } void sched_userret(..) { ... td->td_sched->sleep_pri = 0; /* not in the kernel anymore */ } but other schedulers may just save it and recalculate the priority where the priority calculation just considers the sleep priority as one among many factors. If nothing else, this allows it to be a scheduler decision to ignore it (so 4BSD could continue to do what it does now, but ULE may ignore it, or ignore certain levels, etc.) -- John Baldwin