Date: Mon, 18 Feb 2002 21:34:07 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Greg Lehey <grog@FreeBSD.ORG> Cc: arch@FreeBSD.ORG, jhb@FreeBSD.ORG Subject: Re: buildworld comparison stable vs current Message-ID: <200202190534.g1J5Y7s58322@apollo.backplane.com> References: <200202170818.g1H8ID067573@apollo.backplane.com> <20020219125840.B2835@sydney.worldwide.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:Have you done any profiling? Also, how many CPUs are there?
:
:As I said on Friday, I really think we should be doing more
:performance measurements, including measuring what's going on at the
:lock level.
:
:Greg
The test boxes are DELL2550's, so 2xCPU (1.1GHz pentium III's),
ECC memory, SCSI drives.
A couple of things have become obvious during my own testing:
* Context switching for every interrupt is expensive
stable: 2467550 voluntary context switches (buildworld -j 10)
current: 23879443 voluntary context switches (buildworld -j 10)
buildworld was causing 10761 context switches per second on the
current box, which is one context switch every 100 uS which is
a serious burden.
* Mutexes are expensive calls. Looking at the getuid() stats
with and without Giant in userret:
unpatched userret, kern.giant.ucred=1 (default)
1 process: 683K
patched userret, kern.giant.ucred=1 (default)
1 process: 739K
Here we have a single mutex being locked and unlocked adds 8%
of overhead to the system call.
This is one reason why the new ucred stuff and the timecounter
modules are so nice, because they manages to entirely do away with
most mutex operations in the critical path. We need to do more of
that.
* Context switching when a mutex is contested is extremely expensive.
With two processes dueling for Giant I observed 400K calls/sec
on one occassion and 200K calls/sec with the same exact setup on
another occassion, which I tracked down to Giant being constantly
contested and sleeping instead of spinning on the second occassion.
What was really worrying here was that the results were non
deterministic. Sometimes I would get the higher run rate, sometimes
I would get the lower run rate, with no rhyme or reason.
In anycase, just to clarify, those original numbers, 1800 vs 2219
seconds, were wrong. -current was running with default malloc options
of 'J'. The correct numbers are 1800 vs 2097.
-Matt
Matthew Dillon
<dillon@backplane.com>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200202190534.g1J5Y7s58322>
