Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Jul 2011 09:16:54 -0700
From:      Steve Kargl <sgk@troutmask.apl.washington.edu>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        freebsd-current@FreeBSD.org, Ivan Voras <ivoras@FreeBSD.org>
Subject:   Re: Heavy I/O blocks FreeBSD box for several seconds
Message-ID:  <20110711161654.GA97361@troutmask.apl.washington.edu>
In-Reply-To: <4E1B1198.6090308@FreeBSD.org>
References:  <20110706170132.GA68775@troutmask.apl.washington.edu> <5080.1309971941@critter.freebsd.dk> <20110706180001.GA69157@troutmask.apl.washington.edu> <4E14A54A.4050106@freebsd.org> <4E155FF9.5090905@FreeBSD.org> <20110707151440.GA75537@troutmask.apl.washington.edu> <4E160C2F.8020001@FreeBSD.org> <20110707200845.GA77049@troutmask.apl.washington.edu> <ivf221$oo2$1@dough.gmane.org> <4E1B1198.6090308@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jul 11, 2011 at 06:07:04PM +0300, Andriy Gapon wrote:
> on 11/07/2011 17:41 Ivan Voras said the following:
> > On 07/07/2011 22:08, Steve Kargl wrote:
> > 
> >> 4BSD kernel gives for N = Ncpu + 1.
> >>
> >> 34 processes:  6 running, 28 sleeping
> >>
> >>    PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME    CPU COMMAND
> >>   1417 kargl       1  71    0   370M   294M RUN     0   1:30 79.39% sasmp
> >>   1416 kargl       1  71    0   370M   294M RUN     0   1:30 79.20% sasmp
> >>   1418 kargl       1  71    0   370M   294M CPU2    0   1:29 78.81% sasmp
> >>   1420 kargl       1  71    0   370M   294M CPU1    2   1:30 78.27% sasmp
> >>   1419 kargl       1  70    0   370M   294M CPU3    0   1:30 77.59% sasmp
> > 
> >> ULE kernel gives for N = Ncpu + 1.
> >>
> >> 34 processes:  6 running, 28 sleeping
> >>
> >>    PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME    CPU COMMAND
> >>   1318 kargl       1 103    0   370M   294M CPU0    0   1:31 100.00% sasmp
> >>   1319 kargl       1 103    0   370M   294M RUN     1   1:29 100.00% sasmp
> >>   1322 kargl       1  99    0   370M   294M CPU2    2   1:03 87.26% sasmp
> >>   1320 kargl       1  91    0   370M   294M RUN     3   1:07 60.79% sasmp
> >>   1321 kargl       1  89    0   370M   294M CPU3    3   1:06 55.18% sasmp
> > 
> > I can confirm this. Look at the priorities column for the two cases. For some
> > reason (CPU affinity?) the loads get asymmetrical on ULE.
> 
> Yeah, but what problem is demonstrated here?

That ULE cannot balance numerically intensive work, leading
to poor performance.

> Are we confident that non-even workload is inherently bad?
> E.g.:
> 79.39 + .. + 77.59 < 5 * 80 = 400
> 100.00 + ... + 55.18 ~~ 402 which is more than theoretically possible :-)
> So it would _appear_ that with ULE we get more work out of available CPUs.
> 
> But it's not clear which of the processes are slaves and which is master.
> It's also not clear why the master takes so much CPU (on par with the
> slaves) -
> from my reading of its description (by Steve) it should be doing only light
> periodic work.

These are all slave processes.  The master process was on a different
node in the cluster.  Each process is doing the exact same computation
with only a small change in a coordinate from (x,y,z) to (x,y+n*dy,z)
with n = 1, 2, 3, 4.  The small change does not causes a different 
code path, so all should complete in nearly identical times.

> If it does have to do CPU-heavy work, then I'd imagine that it should
> spawn only Ncpus - 1 slaves.

And if you have M users on the system?  Also note, you can get the
exact same loading problem by launching Ncpu+1 completely independent
cpu-bound processes.  Ncpu-1 processes will be bound to specific cpus
and 2 processes will ping-pong on one cpu.  This ping-ponging will
simply kill performance.

> Also, if with ULE we get less jumping around between CPUs than with
> 4BSD, that would mean less cache misses and more useful work done.

Well, yes, less cache misses for the pinned processes; and, no, for more
useful work done.

> Still not convinced that there is a problem with ULE here.

It's ULE.  See the last 3 years of my posts on the topic.

> I'd start with the app.

I'd switch to 4BSD ;-).  

-- 
Steve



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110711161654.GA97361>