Date: Mon, 28 Jul 2014 13:43:41 -0700 From: Adrian Chadd <adrian@freebsd.org> To: Ryan Stone <rysto32@gmail.com> Cc: FreeBSD Net <freebsd-net@freebsd.org>, John Jasen <jjasen@gmail.com>, Navdeep Parhar <nparhar@gmail.com> Subject: Re: fastforward/routing: a 3 million packet-per-second system? Message-ID: <CAJ-Vmo=xux9OYBxxjSXj2HQgd7oAd--pqq-2EXcLty6r-aKA1Q@mail.gmail.com> In-Reply-To: <CAFMmRNyeDJrHMjj-z%2B-zBWr=s9=%2BCXyvCccEjMCGGUMN6c2zZA@mail.gmail.com> References: <53CE80DD.9090109@gmail.com> <CAJ-VmomWpc=3dtasbDhhrUpGywPio3_9W2b-RTAeJjq3nahhOQ@mail.gmail.com> <53CEB090.7030701@gmail.com> <CAJ-Vmok8eu-GhaNa%2Bi%2BBLv1ZLtKQt4yNfU7ZXW3H%2BY=2HFj=1w@mail.gmail.com> <53CEB670.9060600@gmail.com> <CAJ-VmonhCg9TvQArtP51rAUjFSe4FpFL8SNCTS6jNwk_Esk%2BEA@mail.gmail.com> <53CEB9B5.7020609@gmail.com> <CAJ-Vmokje1m-LGm6B9M9t5Q4BW8JcVWbkDXyKMEVzVa%2B8reDBw@mail.gmail.com> <83597B15-63B3-4AD7-A458-00B67C9E5396@neville-neil.com> <CAFMmRNyeDJrHMjj-z%2B-zBWr=s9=%2BCXyvCccEjMCGGUMN6c2zZA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 28 July 2014 13:37, Ryan Stone <rysto32@gmail.com> wrote: > On Sun, Jul 27, 2014 at 4:42 PM, George Neville-Neil > <gnn@neville-neil.com> wrote: >> Chiming in late, but don't you mean instruction-retired instead of >> CPU_CLK_UNHALTED_CORE? >> >> Best, >> George > > In my experience instruction-retired gives very misleading profiler > output in most cases. The problem is that instruction-retired gives > equal weight to all instructions, which means that it does not take > into account instructions with long latencies because they (for > example) missed the cache. CPU_CLK_UNHALTED_CORE (or its alias, > unhalted-cycles) is a much better event because it is a nearer proxy > for time-based sampling, which is really what you're interested in > when trying to reduce runtime of processes. Right. It is a union of all the things that screw with you - frontend stall, backend/retire stall, microcode operation stall, FPU length stall, branch misprediction stalls, L3 miss (ie, memory) stall, cache ping-ponging stalls. Figuring out -which- of those above are the problem requires a little further digging. > My one big complaint with unhalted-cycles is that it does not take > into effect CPU time spent in busy-wait loops that use the pause > instruction, so it vastly unweights time spent adaptively spinning on > kernel mutexes, for instance. Well, it depends if you want to know about the places that it's spending in busy-wait loops using PAUSE or not. (Are there any flags / modifiers that have the CPU not count that?) > I'm also not sure what it does when the > CPU is adjusting its frequency, but that's not a case that I ever have > to deal with personally. That's the difference between _CORE and _REF. -a
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmo=xux9OYBxxjSXj2HQgd7oAd--pqq-2EXcLty6r-aKA1Q>