From owner-freebsd-arch@FreeBSD.ORG Fri Mar 7 15:36:08 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD9D2106566C for ; Fri, 7 Mar 2008 15:36:08 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 9EDFE8FC28 for ; Fri, 7 Mar 2008 15:36:08 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by elvis.mu.org (Postfix) with ESMTP id 4742F1A4D7C; Fri, 7 Mar 2008 07:16:24 -0800 (PST) From: John Baldwin To: freebsd-arch@freebsd.org Date: Fri, 7 Mar 2008 10:16:46 -0500 User-Agent: KMail/1.9.7 References: <20080307020626.G920@desktop> <200803070842.37248.jhb@freebsd.org> In-Reply-To: <200803070842.37248.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803071016.46944.jhb@freebsd.org> Cc: Subject: Re: Getting rid of the static msleep priority boost X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Mar 2008 15:36:09 -0000 On Friday 07 March 2008 08:42:37 am John Baldwin wrote: > On Friday 07 March 2008 07:16:30 am Jeff Roberson wrote: > > Hello, > > > > I've been studying some problems with recent scheduler improvements that > > help a lot on some workloads and hurt on others. I've tracked the > > problem down to static priority boosts handed out by > > msleep/cv_broadcastpri. The basic problem is that a user thread will be > > woken with a kernel priority thus allowing it to preempt a thread running > > on any processor with a lesser priority. The lesser priority thread may > > in fact hold some resource that the higher priority thread requires. > > Thus we context switch several times and perhaps go through priority > > propagation as well. > > > > I have verified that disabling these static priority boosts entirely > > fixes the performance problem I've run into on at least one workload. > > There are probably others that it helps and hopefully we can discover > > that. > > > > I'd like to know if anyone has a strong preference to keep this feature. > > It is likely that it helps in some interactive situations. I'm not sure > > how much however. I propose that we make a sysctl that disables it and > > turn it off by default. If we see complaints on current@ we can suggest > > that they toggle the sysctl to see if it alleviates problems. > > > > Based on feedback from that experiment and some testing we can then > > choose a few options: > > > > 1) Disable the static boosts entirely. Leave kernel priorities for > > kernel threads and priority propagation. Most other kernels do this. > > Would make my life in ULE much easier as well. > > > > 2) Leave the support for static boosts but remove it from all but a few > > key locations. Leaving it in the api would give some flexibility but > > might confuse developers. > > > > 3) Leave things as they are. undesirable. > > > > I'm leaning towards #2 based on the information I have presently. This > > is almost a significant change to historic BSD behavior so we might want > > to tread lightly. > > One thing to note is that we actually depend on the priority boost (evilly) > to pick processes to swap out. (I think we check for <= PSOCK and don't > swap those out). One thing that I've wanted to happen for a while is that > the sleep priority for msleep() just be a parameter available to the > scheduler that the scheduler can use to calculate the real internal > priority rather than just being a set. That is, I imagine having: > > void sched_set_sleep_prio(struct thread *td, u_char pri); > u_char sched_get_sleep_prio(struct thread *td); > > (The swap check would use the get call). The 4BSD scheduler's > implementation of sched_set_sleep_prio would look like this: > > void > sched_set_sleep_prio(struct thread *td, u_char pri) > { > > td->td_sched->sleep_pri = pri; > sched_prio(td, pri); > } > > void > sched_userret(..) > { > > ... > td->td_sched->sleep_pri = 0; /* not in the kernel anymore */ > } > > but other schedulers may just save it and recalculate the priority where > the priority calculation just considers the sleep priority as one among > many factors. If nothing else, this allows it to be a scheduler decision > to ignore it (so 4BSD could continue to do what it does now, but ULE may > ignore it, or ignore certain levels, etc.) One thing to clarify: I'm not opposed to replacing the PSOCK check with something more suitable in the swap code, (in fact, that would be desirable), but it might take a good bit of work to do that and is probably easier to work on that as a separate change. I also think there can be some merit in having code paths hint to the scheduler the relative interactivity/priority of a sleep. -- John Baldwin