From owner-freebsd-performance@FreeBSD.ORG Wed Oct 24 16:15:29 2007 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9749D16A421; Wed, 24 Oct 2007 16:15:29 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx01.syd.optusnet.com.au (fallbackmx01.syd.optusnet.com.au [211.29.132.93]) by mx1.freebsd.org (Postfix) with ESMTP id B209513C48D; Wed, 24 Oct 2007 16:15:28 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au [211.29.132.192]) by fallbackmx01.syd.optusnet.com.au (8.12.11.20060308/8.12.11) with ESMTP id l9O7svR2000977; Wed, 24 Oct 2007 17:54:57 +1000 Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail11.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l9O7sVO5030677 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 24 Oct 2007 17:54:33 +1000 Date: Wed, 24 Oct 2007 17:54:31 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Kris Kennaway In-Reply-To: <471E343C.2040509@FreeBSD.org> Message-ID: <20071024171915.E84143@delplex.bde.org> References: <8cb6106e0710230902x4edf2c8eu2d912d5de1f5d4a2@mail.gmail.com> <471E343C.2040509@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: josh.carroll@gmail.com, remy.nonnenmacher@activnetworks.com, freebsd-performance@freebsd.org Subject: Re: ULE vs. 4BSD in RELENG_7 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Oct 2007 16:15:29 -0000 On Tue, 23 Oct 2007, Kris Kennaway wrote: > Josh Carroll wrote: >> Anyway, in summary, ULE is about 5-6 % slower than 4BSD for two >> workloads that I am sensitive to: building world with -j X, and ffmpeg >> -threads X. Other benchmarks seem to indicate relatively equal >> performance between the two. MySQL, on the other hand, is >> significantly faster in ULE. 5-6% is a lot. ULE has some tuning for makeworld in -current, which for me reduced it to less than 1% slower than 4BSD (down from 5-10% slower), for the case of makeworld -j4 over nfs on a 2-CPU system with the sources pre-cached on the server and objects on a local file system, and extensive local tuning of makeworld, nfs and network drivers. I think the tuning in ULE was mainly for a 2-CPU system, because makeworld seemed to be very bad under ULE only with 2 CPUs. Apparently, it is also very bad with more CPUs. There are sysctls to modify the ULE tuning. >> I'm trying to understand why ffmpeg and buildworld are slower in ULE >> than 4BSD, since it seems to me that ULE was supposed to be the better >> scaling scheduler. Makeworld is slower because any scheduling is bad for it. More context switches take longer and cost more by reducing affinity. >> Does anyone have any additional performance tests I can run that might >> help indicate where the deficiency is in the ULE scheduler? MySQL >> performance is excellent, so I'm wondering if it was tuned to that >> particular workload? I think it was. > One major difference is that your workload is 100% user. Also you were > reporting ULE had more idle time, which looks like a bug since I would expect > it be basically 0% idle on such a workload. No, at least buildworld, while being mainly user-CPU-bound by the gcc hog, does some disk accesses and a significant number of sycalls. I have to work very hard to reduce its idle time to about 5% for UP on local disks and to 11% for 2-way SMP over nfs. More idle time for ULE at least used to be a feature. ULE sometimes wants to avoid switching to another thread immediately, in the hope of finding a thread with with better affinity than the currently runnable ones. It waited far too long (in its idle threads) for makeworld with 2 CPUs. Waiting has a better chance of being best if there are many CPUs. Bruce