From owner-cvs-all@FreeBSD.ORG Thu Jun 7 04:40:06 2007 Return-Path: X-Original-To: cvs-all@freebsd.org Delivered-To: cvs-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D45B816A468; Thu, 7 Jun 2007 04:40:06 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail24.syd.optusnet.com.au (mail24.syd.optusnet.com.au [211.29.133.165]) by mx1.freebsd.org (Postfix) with ESMTP id 690A813C43E; Thu, 7 Jun 2007 04:40:06 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail24.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l574doYA015414 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 7 Jun 2007 14:39:53 +1000 Date: Thu, 7 Jun 2007 14:39:53 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20070606154548.F3105@besplex.bde.org> Message-ID: <20070607133524.S7002@besplex.bde.org> References: <200706051420.l55EKEih018925@repoman.freebsd.org> <3bbf2fe10706050829o2d756a4cu22f98cf11c01f5e4@mail.gmail.com> <3bbf2fe10706050843x5aaafaafy284e339791bcfe42@mail.gmail.com> <200706051230.21242.jhb@freebsd.org> <20070606094354.E51708@delplex.bde.org> <20070605195839.I606@10.0.0.1> <20070606154548.F3105@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: src-committers@freebsd.org, John Baldwin , cvs-src@freebsd.org, cvs-all@freebsd.org, Attilio Rao , Kostik Belousov , Jeff Roberson Subject: Re: cvs commit: src/sys/kern kern_mutex.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jun 2007 04:40:07 -0000 On Wed, 6 Jun 2007, Bruce Evans wrote: > On Tue, 5 Jun 2007, Jeff Roberson wrote: >> You should try with kern.sched.pick_pri = 0. I have changed this to be the >> default recently. This weakens the preemption and speeds up some >> workloads. > > I haven't tried a new SCHED_ULE kernel yet. Tried now. In my makeworld benchmark, SCHED_ULE is now only 4% slower than SCHED_4BSD (after losing 2% in SCHED_4BSD) (down from about 7% slower). The difference is still from CPUs idling too much. Best result ever (SCHED_4BSD, June 4 kernel, no PREEMPTION): --- 827.48 real 1309.26 user 186.86 sys 1332122 voluntary context switches 1535129 involuntary context switches pagezero time 6 seconds --- After thread lock changes (SCHED_4BSD, no PREEMPTION): --- 847.70 real 1309.83 user 169.39 sys 2933415 voluntary context switches 1501808 involuntary context switches pagezero time 30 seconds. Unlike what I wrote before, there is a scheduling bug that affects pagezero directly. The bug from last month involving pagezero losing its priority of PRI_MAX_IDLE and running at priority PUSER is back. This bug seemed to be gone in the June 4 kernel, but actually only happens less there. This bug seems to cost 0.5-1.0% real time. --- After thread lock changes (SCHED_4BSD, now with PREEMPTION): --- 843.34 real 1304.00 user 168.87 sys 1651011 voluntary context switches 1630988 involuntary context switches pagezero time 27 seconds The problem with the extra context switches is gone (these context switch counts are like the ones in old kernels with PREEMPTION). This result is affected by pagezero getting its priority clobbered. The best result for an old kernel with PREMPTION was about 840 seconds, before various optimizations reduced this to 827 seconds (-0+4 seconds). --- Old run with SCHED_ULE (Mar 18): 899.50 real 1311.00 user 187.47 sys 1566366 voluntary context switches 1959436 involuntary context switches pagezero time 19 seconds --- Today with SCHED_ULE: --- 883.65 real 1290.92 user 188.21 sys 1658109 voluntary context switches 1708148 involuntary context switches pagezero time 7 seconds. --- In all of these, the user + sys decomposition is very inaccurate, but the (user + sys + pagezero_time) total is fairly accurate. It is 1500+-2 for SCHED_4BSD and 1500+-17 for SCHED_ULE (old ULE larger, current ULE smaller). SCHED_ULE now shows intereting behaviour for non-parallel kernel builds on a 2-way SMP machine. It is now slightly faster than SCHED_4BSD for this, but still much slower for parallel kernel builds. This might be because it likes to leave 1 CPU idle to wait to find a better CPU to run on, and this is actually an optimization when there is >= 1 CPU to spare: RELENG_4 kernel build on nfs, non-parallel make. Best ever with SCHED_ULE (~June 4 kernel): 62.55 real 55.30 user 3.65 sys Current with SCHED_ULE: 62.18 real 54.91 user 3.51 sys RELENG_4 kernel build on nfs, make -j4. Best ever for SCHED_ULE (~June 4 kernel): 32.00 real 56.98 user 3.90 sys Current with SCHED_ULE: 33.11 real 56.01 user 4.12 sys ULE has been about 1 second slower for this since at least last November. It presumably reduces user+sys time by running pagezero more. The slowdown is much larger for a build on ffs: Non-parallel results not shown (litte difference from above). RELENG_4 kernel build on ffs, make -j4. Best ever for SCHED_ULE (~June 4 kernel): 29.94 real 56.03 user 3.12 sys Current with SCHED_ULE: 32.63 real 55.13 user 3.53 sys Now 9% of the real time (= 18% of the cycles on one CPU = almost the sys sys overhead) is apparently wasted by leaving one CPU idle. This benchmark is of course dominated by many instances of 2 gcc hogs which should be scheduled to run in parallel with no idle cycles. (In all these kernel benchmarks, everything except disk writes is cached before starting. In other makeworld benchmarks, everything is cached before starting on the nfs server, while on the client nothing is cached.) I don't have context switch counts or pagezero times for the kernel builds. stathz is 100 = hz. Maybe SCHED_ULE doesn't like this. hz = 100 is about 1% faster than hz = 1000 for the makeworld benchmark. Bruce