Date: Sat, 30 Mar 2002 15:10:20 +0100 From: Poul-Henning Kamp <phk@critter.freebsd.dk> To: Robert Watson <rwatson@FreeBSD.ORG> Cc: Matthew Dillon <dillon@apollo.backplane.com>, John Baldwin <jhb@FreeBSD.ORG>, freebsd-smp@FreeBSD.ORG Subject: Re: Syscall contention tests return, userret() bugs/issues. Message-ID: <76368.1017497420@critter.freebsd.dk> In-Reply-To: Your message of "Sat, 30 Mar 2002 08:30:48 EST." <Pine.NEB.3.96L.1020330082409.73912V-100000@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
In message <Pine.NEB.3.96L.1020330082409.73912V-100000@fledge.watson.org>, Robe rt Watson writes: >That said, if getuid as the example micro-benchmark can be demonstrated to >causally affect optimize the macro-benchmark, then the selection of >micro-benchmark by implementation facility sounds reasonable to me. :-) Well, my gripe with microbenchmarks like this is that they are very very very hard to get right. Matt obviously didn't get it right as he himself noticed: one testcase ran faster despite the fact that it was doing more work. This means that the behaviour of caches (of all sorts) were a larger factor than his particular change to the code. The elimination (practically or by calculation) of the effects of caches on microbenchmarks is by now a science onto itself. I am very afraid that we will see people optimize for the cache-footprint of their microbenchmarks rather than their microbenchmarks themselves. Remember how Linux optimized for the wrong parameters because of lmbench ? We don't want to go there... The only credible way to get a sensible results from a micro benchmark that can be extrapolated to macro performance involves adding a known or predictable, varying entropy load as jitter factor and use a long integration times (>6hours). That automatically takes you into the territory of temperature stabilization and atomic referenced clock signals etc. And quite frankly, having gone there and come back I can personally tell you that life isn't long enough for that. (And no, just disabling caches is not a solution because then your are not putting the CPU in a representative memory environment anymore, that's like benchmarking car performance only in 1st gear. So right now I think that our requirement for doing optimizations should be: 1. It simplifies the code significantly. or 2. It carries undisputed theoretical improvement. or 3. It gives a statistically significant macroscopic improvement in a (reasonably) well-defined workload of relevance. The practical guide to execute #3 should be: A = Time reference code B = Time modified code C = Time reference code D = Time modified code Unless both A and C are lower than both B and D it will take a lot of carefully controlled test-runs to prove that there is a statistically significant improvement (standard deviations and all that...) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?76368.1017497420>