Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Jul 2014 13:43:41 -0700
From:      Adrian Chadd <adrian@freebsd.org>
To:        Ryan Stone <rysto32@gmail.com>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, John Jasen <jjasen@gmail.com>, Navdeep Parhar <nparhar@gmail.com>
Subject:   Re: fastforward/routing: a 3 million packet-per-second system?
Message-ID:  <CAJ-Vmo=xux9OYBxxjSXj2HQgd7oAd--pqq-2EXcLty6r-aKA1Q@mail.gmail.com>
In-Reply-To: <CAFMmRNyeDJrHMjj-z%2B-zBWr=s9=%2BCXyvCccEjMCGGUMN6c2zZA@mail.gmail.com>
References:  <53CE80DD.9090109@gmail.com> <CAJ-VmomWpc=3dtasbDhhrUpGywPio3_9W2b-RTAeJjq3nahhOQ@mail.gmail.com> <53CEB090.7030701@gmail.com> <CAJ-Vmok8eu-GhaNa%2Bi%2BBLv1ZLtKQt4yNfU7ZXW3H%2BY=2HFj=1w@mail.gmail.com> <53CEB670.9060600@gmail.com> <CAJ-VmonhCg9TvQArtP51rAUjFSe4FpFL8SNCTS6jNwk_Esk%2BEA@mail.gmail.com> <53CEB9B5.7020609@gmail.com> <CAJ-Vmokje1m-LGm6B9M9t5Q4BW8JcVWbkDXyKMEVzVa%2B8reDBw@mail.gmail.com> <83597B15-63B3-4AD7-A458-00B67C9E5396@neville-neil.com> <CAFMmRNyeDJrHMjj-z%2B-zBWr=s9=%2BCXyvCccEjMCGGUMN6c2zZA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 28 July 2014 13:37, Ryan Stone <rysto32@gmail.com> wrote:
> On Sun, Jul 27, 2014 at 4:42 PM, George Neville-Neil
> <gnn@neville-neil.com> wrote:
>> Chiming in late, but don't you mean instruction-retired instead of
>> CPU_CLK_UNHALTED_CORE?
>>
>> Best,
>> George
>
> In my experience instruction-retired gives very misleading profiler
> output in most cases.  The problem is that instruction-retired gives
> equal weight to all instructions, which means that it does not take
> into account instructions with long latencies because they (for
> example) missed the cache.  CPU_CLK_UNHALTED_CORE (or its alias,
> unhalted-cycles) is a much better event because it is a nearer proxy
> for time-based sampling, which is really what you're interested in
> when trying to reduce runtime of processes.

Right.

It is a union of all the things that screw with you - frontend stall,
backend/retire stall, microcode operation stall, FPU length stall,
branch misprediction stalls, L3 miss (ie, memory) stall, cache
ping-ponging stalls.

Figuring out -which- of those above are the problem requires a little
further digging.

> My one big complaint with unhalted-cycles is that it does not take
> into effect CPU time spent in busy-wait loops that use the pause
> instruction, so it vastly unweights time spent adaptively spinning on
> kernel mutexes, for instance.

Well, it depends if you want to know about the places that it's
spending in busy-wait loops using PAUSE or not.
(Are there any flags / modifiers that have the CPU not count that?)

> I'm also not sure what it does when the
> CPU is adjusting its frequency, but that's not a case that I ever have
> to deal with personally.

That's the difference between _CORE and _REF.



-a



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmo=xux9OYBxxjSXj2HQgd7oAd--pqq-2EXcLty6r-aKA1Q>