Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Feb 2002 21:34:07 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Greg Lehey <grog@FreeBSD.ORG>
Cc:        arch@FreeBSD.ORG, jhb@FreeBSD.ORG
Subject:   Re: buildworld comparison stable vs current
Message-ID:  <200202190534.g1J5Y7s58322@apollo.backplane.com>
References:  <200202170818.g1H8ID067573@apollo.backplane.com> <20020219125840.B2835@sydney.worldwide.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help

:Have you done any profiling?  Also, how many CPUs are there?
:
:As I said on Friday, I really think we should be doing more
:performance measurements, including measuring what's going on at the
:lock level.
:
:Greg

    The test boxes are DELL2550's, so 2xCPU (1.1GHz pentium III's),
    ECC memory, SCSI drives.

    A couple of things have become obvious during my own testing:

    * Context switching for every interrupt is expensive

	stable:   2467550  voluntary context switches (buildworld -j 10)
	current: 23879443  voluntary context switches (buildworld -j 10)

	buildworld was causing 10761 context switches per second on the
	current box, which is one context switch every 100 uS which is
	a serious burden.

    * Mutexes are expensive calls.   Looking at the getuid() stats
      with and without Giant in userret:

	unpatched userret,  kern.giant.ucred=1 (default)
	    1 process:      683K

	patched userret, kern.giant.ucred=1 (default)
	    1 process:      739K

      Here we have a single mutex being locked and unlocked adds 8%
      of overhead to the system call.

      This is one reason why the new ucred stuff and the timecounter
      modules are so nice, because they manages to entirely do away with
      most mutex operations in the critical path.  We need to do more of
      that.

    * Context switching when a mutex is contested is extremely expensive.
      With two processes dueling for Giant I observed 400K calls/sec
      on one occassion and 200K calls/sec with the same exact setup on
      another occassion, which I tracked down to Giant being constantly
      contested and sleeping instead of spinning on the second occassion.

      What was really worrying here was that the results were non 
      deterministic.  Sometimes I would get the higher run rate, sometimes
      I would get the lower run rate, with no rhyme or reason.


    In anycase, just to clarify, those original numbers, 1800 vs 2219
    seconds, were wrong.  -current was running with default malloc options
    of 'J'.  The correct numbers are 1800 vs 2097.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200202190534.g1J5Y7s58322>