From owner-freebsd-arch Thu Jan 23 15:19:46 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0473237B405 for ; Thu, 23 Jan 2003 15:19:44 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4613E43ED8 for ; Thu, 23 Jan 2003 15:19:43 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h0NNJd260537; Thu, 23 Jan 2003 18:19:39 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Thu, 23 Jan 2003 18:19:39 -0500 (EST) From: Jeff Roberson To: Bosko Milekic Cc: arch@FreeBSD.ORG Subject: Re: New scheduler In-Reply-To: <20030123170611.A79549@unixdaemons.com> Message-ID: <20030123181343.E2966-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, 23 Jan 2003, Bosko Milekic wrote: B> > Hey Jeff, > > First of all, let me say that I think the work you've undertook is > superb, given that a re-write of the scheduler is not the easiest > thing to do in the world, undertaking the task is pretty courageous. > Thanks! :-) > On Thu, Jan 23, 2003 at 01:38:44AM -0500, Jeff Roberson wrote: > [...] > > make -j4 buildworld on a 2 way Athlon 1800MP with one ata disk. > > > > new sched: > > 1933.193u 1156.398s 56:31.33 91.1% 2628+2106k 18752+4863io 8538pf+0w > > old sched: > > 2153.557u 1803.705s 48:25.07 136.2% 2462+1925k 17250+4666io 7113pf+0w > > > > > > What you can see here is that the sys time and user time were greatly > > reduced. By approx. 35% and 10% respectively. But, since we're not > > evenly balancing the load across both cpus the real time suffered. I > > don't expect the speedup to be this good once both cpus are well utilized > > due to memory bus contention. > > This is impressive. Yeah, hopefully the speedup stays after the cpus are well balanced. > > [...] > > You just need one file. It's available at > > http://www.chesapeake.net/~jroberson/sched_smp.c > > > > Cheers, > > Jeff > > OK, after looking over the code, I'm curious: why does everything > still seem to be protected by the sched_lock? Can you not now protect > the per-CPU runqueues with their own spinlocks? I'm assuming that the > primary reason for not switching to the finer grained model is > complications related to the sched_lock protecting necessarily > unpremptable sections of code elsewhere in the kernel... notably > switching to a more finer grained model would involve changes in the > context switching code and I think we would have to teach some MD code > about the per-CPU runqueues, which would make this less "pluggable" than > it was intended, correct? stand -> walk -> run :-) I didn't want to make it any more invasive than it currently is as that would require either desupporting the current scheduler, or using it only on UP. Also, it's a lot of extra effort and a lot of extra bugs. I doubt there is much sched lock contention today. > > I think that one of the main advantages of this thing is the reduction > of the contention on the sched lock. If that can be achieved than > scheduling any thread, including interrupt threads, would already be > cheaper than it currently is (assuming you could go through a context > switch without the global sched_lock, and I don't see why with this > code you could not). I'd like to reeorg the mi_switch/cpu_switch path. I'd like do pick the new thread in mi_switch and hand it off to cpu_switch instead of calling back into sched_choose(). This will make all of this slightly cleaner. > > Finally, I have one question regarding your results. Obviously, 35% > and 10% are noteworthy numbers. What do you attribute the speedup to, > primarily, given that this is still all under a global sched_lock? > > Thanks again for all your work. > There are a few factors. Most notably the cpu affinity. The caches are trashing so much on SMP with the old scheduler that it's actually slower than UP in some cases. Also, since the balancing is currently pooched the memory bus is contended for less. So the 35% will probably get a bit smaller, but hopefully the real time will too. The new scheduler is also algorithmically cheaper. 10 times a second schedcpu() would run on the old scheduler and pollute your cache. With lots of processes this code could take a while too. Cheers, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message