From owner-freebsd-current@FreeBSD.ORG Mon Jul 11 15:07:08 2011 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5DF0106564A for ; Mon, 11 Jul 2011 15:07:07 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 04B5C8FC17 for ; Mon, 11 Jul 2011 15:07:06 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA24352; Mon, 11 Jul 2011 18:07:04 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E1B1198.6090308@FreeBSD.org> Date: Mon, 11 Jul 2011 18:07:04 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Ivan Voras References: <20110706170132.GA68775@troutmask.apl.washington.edu> <5080.1309971941@critter.freebsd.dk> <20110706180001.GA69157@troutmask.apl.washington.edu> <4E14A54A.4050106@freebsd.org> <4E155FF9.5090905@FreeBSD.org> <20110707151440.GA75537@troutmask.apl.washington.edu> <4E160C2F.8020001@FreeBSD.org> <20110707200845.GA77049@troutmask.apl.washington.edu> In-Reply-To: X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-current@FreeBSD.org, Steve Kargl Subject: Re: Heavy I/O blocks FreeBSD box for several seconds X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2011 15:07:08 -0000 on 11/07/2011 17:41 Ivan Voras said the following: > On 07/07/2011 22:08, Steve Kargl wrote: > >> 4BSD kernel gives for N = Ncpu + 1. >> >> 34 processes: 6 running, 28 sleeping >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND >> 1417 kargl 1 71 0 370M 294M RUN 0 1:30 79.39% sasmp >> 1416 kargl 1 71 0 370M 294M RUN 0 1:30 79.20% sasmp >> 1418 kargl 1 71 0 370M 294M CPU2 0 1:29 78.81% sasmp >> 1420 kargl 1 71 0 370M 294M CPU1 2 1:30 78.27% sasmp >> 1419 kargl 1 70 0 370M 294M CPU3 0 1:30 77.59% sasmp > >> ULE kernel gives for N = Ncpu + 1. >> >> 34 processes: 6 running, 28 sleeping >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND >> 1318 kargl 1 103 0 370M 294M CPU0 0 1:31 100.00% sasmp >> 1319 kargl 1 103 0 370M 294M RUN 1 1:29 100.00% sasmp >> 1322 kargl 1 99 0 370M 294M CPU2 2 1:03 87.26% sasmp >> 1320 kargl 1 91 0 370M 294M RUN 3 1:07 60.79% sasmp >> 1321 kargl 1 89 0 370M 294M CPU3 3 1:06 55.18% sasmp > > I can confirm this. Look at the priorities column for the two cases. For some > reason (CPU affinity?) the loads get asymmetrical on ULE. Yeah, but what problem is demonstrated here? Are we confident that non-even workload is inherently bad? E.g.: 79.39 + .. + 77.59 < 5 * 80 = 400 100.00 + ... + 55.18 ~~ 402 which is more than theoretically possible :-) So it would _appear_ that with ULE we get more work out of available CPUs. But it's not clear which of the processes are slaves and which is master. It's also not clear why the master takes so much CPU (on par with the slaves) - from my reading of its description (by Steve) it should be doing only light periodic work. If it does have to do CPU-heavy work, then I'd imagine that it should spawn only Ncpus - 1 slaves. Finally, I agree with Vlad that "logically idle" thread should not use lots of CPU (or even 100%). Scheduler doesn't know which thread uses 100% for useful work and which does it while simply spinning. Also, if with ULE we get less jumping around between CPUs than with 4BSD, that would mean less cache misses and more useful work done. Still not convinced that there is a problem with ULE here. I'd start with the app. P.S. Not saying that this is the case here, but I've seen the following scenario in real life. People add only nominal support for some platform in their software - when they don't know how to properly implement some feature, they just drop that feature or implement it very suboptimally. Then other people use bad performance of that software on that platform as indication that there is something wrong with the platform, not the software. -- Andriy Gapon