Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Feb 2012 21:47:05 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Jeff Roberson <jroberson@jroberson.net>
Cc:        freebsd-hackers@FreeBSD.org, Florian Smeets <flo@FreeBSD.org>, Andriy Gapon <avg@FreeBSD.org>
Subject:   Re: [RFT][patch] Scheduling for HTT and not only
Message-ID:  <4F3C0BB9.6050101@FreeBSD.org>
In-Reply-To: <4F3990EA.1080002@FreeBSD.org>
References:  <4F2F7B7F.40508@FreeBSD.org> <4F366E8F.9060207@FreeBSD.org> <4F367965.6000602@FreeBSD.org> <4F396B24.5090602@FreeBSD.org> <alpine.BSF.2.00.1202131012270.2020@desktop> <4F3978BC.6090608@FreeBSD.org> <alpine.BSF.2.00.1202131108460.2020@desktop> <4F3990EA.1080002@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 02/14/12 00:38, Alexander Motin wrote:
> I see no much point in committing them sequentially, as they are quite
> orthogonal. I need to make one decision. I am going on small vacation
> next week. It will give time for thoughts to settle. May be I indeed
> just clean previous patch a bit and commit it when I get back. I've
> spent too much time trying to make these things formal and so far
> results are not bad, but also not so brilliant as I would like. May be
> it is indeed time to step back and try some more simple solution.

I've decided to stop those cache black magic practices and focus on 
things that really exist in this world -- SMT and CPU load. I've dropped 
most of cache related things from the patch and made the rest of things 
more strict and predictable:
http://people.freebsd.org/~mav/sched.htt34.patch

This patch adds check to skip fast previous CPU selection if it's SMT 
neighbor is in use, not just if no SMT present as in previous patches.

I've took affinity/preference algorithm from the first patch and 
improved it. That makes pickcpu() to prefer previous core or it's 
neighbors in case of equal load. That is very simple to keep it, but 
still should give cache hits.

I've changed the general algorithm of topology tree processing. First I 
am looking for idle core on the same last-level cache as before, with 
affinity to previous core or it's neighbors on higher level caches. 
Original code could put additional thread on already busy core, while 
next socket is completely idle. Now if there is no idle core on this 
cache, then all other CPUs are checked.

CPU groups comparison now done in two steps: first, same as before, 
compared summary load of all cores; but now, if it is equal, I am 
comparing load of the less/most loaded cores. That should allow to 
differentiate whether load 2 really means 1+1 or 2+0. In that case group 
with 2+0 will be taken as more loaded than one with 1+1, making group 
choice more grounded and predictable.

I've added randomization in case if all above factors are equal.

As before I've tested this on Core i7-870 with 4 physical and 8 logical 
cores and Atom D525 with 2 physical and 4 logical cores. On Core i7 I've 
got speedup up to 10-15% in super-smack MySQL and PostgreSQL indexed 
select for 2-8 threads and no penalty in other cases. pbzip2 shows up to 
13% performance increase for 2-5 threads and no penalty in other cases.

Tests on Atom show mostly about the same performance as before in 
database benchmarks: faster for 1 thread, slower for 2-3 and about the 
same for other cases. Single stream network performance improved same as 
for the first patch. That CPU is quite difficult to handle as with mix 
of effective SMT and lack of L3 cache different scheduling approaches 
give different results in different situations.

Specific performance numbers can be found here:
http://people.freebsd.org/~mav/bench.ods
Every point there includes at least 5 samples and except pbzip2 test 
that is quite unstable with previous sources all are statistically valid.

Florian is now running alternative set of benchmarks on dual-socket 
hardware without SMT.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F3C0BB9.6050101>