From owner-freebsd-smp Sat Mar 30 6:12:41 2002 Delivered-To: freebsd-smp@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id 5050837B404; Sat, 30 Mar 2002 06:12:36 -0800 (PST) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.2/8.12.2) with ESMTP id g2UEAKe7076369; Sat, 30 Mar 2002 15:10:21 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Robert Watson Cc: Matthew Dillon , John Baldwin , freebsd-smp@FreeBSD.ORG Subject: Re: Syscall contention tests return, userret() bugs/issues. In-Reply-To: Your message of "Sat, 30 Mar 2002 08:30:48 EST." Date: Sat, 30 Mar 2002 15:10:20 +0100 Message-ID: <76368.1017497420@critter.freebsd.dk> From: Poul-Henning Kamp Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org In message , Robe rt Watson writes: >That said, if getuid as the example micro-benchmark can be demonstrated to >causally affect optimize the macro-benchmark, then the selection of >micro-benchmark by implementation facility sounds reasonable to me. :-) Well, my gripe with microbenchmarks like this is that they are very very very hard to get right. Matt obviously didn't get it right as he himself noticed: one testcase ran faster despite the fact that it was doing more work. This means that the behaviour of caches (of all sorts) were a larger factor than his particular change to the code. The elimination (practically or by calculation) of the effects of caches on microbenchmarks is by now a science onto itself. I am very afraid that we will see people optimize for the cache-footprint of their microbenchmarks rather than their microbenchmarks themselves. Remember how Linux optimized for the wrong parameters because of lmbench ? We don't want to go there... The only credible way to get a sensible results from a micro benchmark that can be extrapolated to macro performance involves adding a known or predictable, varying entropy load as jitter factor and use a long integration times (>6hours). That automatically takes you into the territory of temperature stabilization and atomic referenced clock signals etc. And quite frankly, having gone there and come back I can personally tell you that life isn't long enough for that. (And no, just disabling caches is not a solution because then your are not putting the CPU in a representative memory environment anymore, that's like benchmarking car performance only in 1st gear. So right now I think that our requirement for doing optimizations should be: 1. It simplifies the code significantly. or 2. It carries undisputed theoretical improvement. or 3. It gives a statistically significant macroscopic improvement in a (reasonably) well-defined workload of relevance. The practical guide to execute #3 should be: A = Time reference code B = Time modified code C = Time reference code D = Time modified code Unless both A and C are lower than both B and D it will take a lot of carefully controlled test-runs to prove that there is a statistically significant improvement (standard deviations and all that...) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message