From owner-cvs-all@FreeBSD.ORG Sat Apr 12 00:42:32 2003 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C560537B401; Sat, 12 Apr 2003 00:42:32 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9A9C343FAF; Sat, 12 Apr 2003 00:42:30 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id RAA17493; Sat, 12 Apr 2003 17:42:15 +1000 Date: Sat, 12 Apr 2003 17:42:14 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Jeff Roberson In-Reply-To: <20030411145419.E37530-100000@mail.chesapeake.net> Message-ID: <20030412162452.N2065@gamplex.bde.org> References: <20030411145419.E37530-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: cvs-src@FreeBSD.org cc: Jeff Roberson cc: src-committers@FreeBSD.org cc: cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/kern sched_ule.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Apr 2003 07:42:33 -0000 On Fri, 11 Apr 2003, Jeff Roberson wrote: > On Fri, 11 Apr 2003, Bruce Evans wrote: > > > On Thu, 10 Apr 2003, Jeff Roberson wrote: > > > > > In this benchmark I have many processes that each run for 10ms and then > > > sleep for 10ms. Each of them has a different nice value. I record many > > > statistics, but below is the amount of cpu time given to each. > > > > > > -20 -15 -10 -5 -1 0 > > > ULE 5.753623 5.282634 4.856530 3.450129 3.126798 2.423626 > > > 4BSD 5.737184 4.694133 4.185013 3.363824 2.711405 2.260881 > > > > > > 1 5 10 15 20 > > > ULE 2.105255 0.657852 0.429098 0.411582 0.501759 > > > 4BSD 2.452399 1.506065 0.859527 0.681111 0.427333 > > > > > > > > > You can see that ule is quite regular until it gets to the positive nice > > > values where it is not as smooth as 4bsd. I'll fix this. > > > > The poor dynamic range of niceness for the 4bsd scheduler in current should > > not be aimed for. In the above it is about 5.74:0.43 = 13:1. It should be > > more like 100:1 or 1000:1. > > I don't think this is so bad considering that they were only attempting to > use 50% of the cpu. That leaves a lot of idle time for other threads. I'm not sure how much using only 50% of the CPU affects the results, but I thin it is not much if there are many process. If there are more than 2 un-nice ones then the combination of those 2 will attempt to use 100% of the CPU, and similarly for 2 nice ones. OTOH, with only 1 un-nice process and 1 nice one then each should get the 50% that it wants. > Although in ULE it is currently very easy to tune by adjusting your min > and max slice. Since each tick is 10ms on x86 ule currently does 10:1 for > processes of nice 0 and 19. If hz was 1000 it could easily be made 100:1. > I'm not sure if this is really desirable though. I think having some very large rations is useful for processes like setiathome and the kernel pagezero. You really want the to get as little CPU as possible, perhaps without using a completely different scheduling policy like idprio or an infinite ratio which gives much the same thing. Dividing up 1 second into slices doesn't work so well when there are more than 2 processes. With N times as many (long-lived) processes the scheduling decisions would have to be done for N times as long to get the same ratios. Or they can be done for N times as long for the same number of processes to reduce the granularity by a factor of 1/N. The 4BSD scheduler in -current sort of does the former, and my version of it sort of does both. I'm a bit concerned about more inertia in my version of it caused by this but haven't noticed any problems in practice. > ... > What I could do is implement cpu distribution vs nice as a curve that > tapers of considerably once you get past 19 nice values. This would allow > really nice programs to slowly creep along. The problem is that my > scheduler can only adjust the amount of cpu time a process gets by > adjusting its slice size. The minimum is actually quite high. I may > introduce some divisor for the slice that essentially causes this thread > to get skipped over x times before it gets to run for the minimum amount. > This would give more more granularity than 10ms for every pass through the > run queue. I think you need something like that for unusual loads. I don't see how more than HZ processes can be scheduled properly without a multiplier even if they all have the same nice value. > > I think interactivity is mostly a different problem. When we give > > some CPU to niced processes, it is important that we only give them a > > tiny amount. Another problem with the small dynamic range is that it > > always gives a non-tiny amount. Scheduling granularity may be a > > problem, especially in the 4BSD scheduler - we can't give processes > > less than the quantum (if they want it), and with a quantum of 1/10 > > second just 10 niced processes getting the minimal amount would give > > a delay of 1 second for interactive processes. These problems are > > limited by old bugfeatures in BSD for at least the 4BSD scheduler > > (interactive processes wake up at a kernel priority and normally > > essentially keep that priority when they return to userland, since we > > neglect to reschedule them). > ULE solves this differently. Interactivity is determined by the ratio of > voluntary sleep time to actual runtime. If it figures out that a process > is interactive it always puts it on the current queue which means it > doesn't have to wait for the current queue to complete and the next queue > to be switched to. It still seems to be missing code to force the actual switch, mainly here: % void % sched_userret(struct thread *td) % { % struct ksegrp *kg; % % kg = td->td_ksegrp; % % if (td->td_priority != kg->kg_user_pri) { % mtx_lock_spin(&sched_lock); % td->td_priority = kg->kg_user_pri; % mtx_unlock_spin(&sched_lock); % } % } If td->td_priority is actually changed here, then we should run another process if the change reduces it below the priority of another runnable thread. Every assignment to td_priority that doesn't set TDF_NEEDRESCHED and/or consider an immediate switch is suspect. I think technically correct code would set td_priority to a "base" priority (== kg_user_pri for user processes) after waking up and switch then if necessary, and only assert that the priority is the user priority here. > High negative nice values can still delay interactive > processes that are very nice. I think this is what the user is indicating > with negative nice values though isn't he? Or do you think that nice > should not impact interactive processes? It's usually a user error to nice interactive processes. I'm thinking of some corner case where you want to control (perhaps just terminate) a niced process. It would be, er, nice to not have to wait long to do it. Maybe I just want something like renice to a higher priority to work for users. > It's difficult for me to grasp what behavior is most desirable. I have a > framework in place with ULE that would allow me to achieve most anything. > > I guess my primary questions are these: > 1) What distribution of cpu time vs nice is desirable? Depends on the user's requirements. Most users don't use nice :-). > 2) Should nice processes starve interactive jobs? No. Bruce