From owner-freebsd-hackers Mon Nov 25 17:15:14 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6C83637B401 for ; Mon, 25 Nov 2002 17:15:13 -0800 (PST) Received: from rootlabs.com (root.org [67.118.192.226]) by mx1.FreeBSD.org (Postfix) with SMTP id DDB7143EAA for ; Mon, 25 Nov 2002 17:15:09 -0800 (PST) (envelope-from nate@rootlabs.com) Received: (qmail 84185 invoked by uid 1000); 26 Nov 2002 01:15:06 -0000 Date: Mon, 25 Nov 2002 17:15:06 -0800 (PST) From: Nate Lawson To: Luigi Rizzo Cc: hackers@freebsd.org Subject: Re: out-of-order execution and code profiling In-Reply-To: <20021125162615.A52619@xorpc.icir.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 25 Nov 2002, Luigi Rizzo wrote: > I just got hit by a peculiar problem related to out-of-order > execution of instructions. > I was doing some low-level timing measurements using the rdtsc() > around selected pieces of code (the rdtsc() is included in > the TSTMP() functions that are in RELENG_4, source is in > sys/i386/isa/clock.c), as follows: > > TSTMP(3, ifp->if_unit, 1, 0); > tmp = CSR_READ_1(sc, FXP_CSR_SCB_STATACK); > TSTMP(3, ifp->if_unit, 2, 0); > TSTMP(3, ifp->if_unit, 3, 0); > > CSR_READ_1() goes to do a volatile read on memory across a 33MHz > PCI bus, so it should take a very minimum of 100ns, plus arbitration > and bridge crossing and whatnot. To my surprise, on a 750MHz Athlon > box, the delta between the first two timestamps turned out to be > in the order of 39 clock cycles, whereas the delta between 2 and 3 > is the 270-300 cycles range. > > The only explaination i can find is that the rdtsc() within TSTMP() > is executed out of order. > > I wonder, is there on the high-end i386 processors any 'barrier' > instruction of some kind that enforces in-order execution of some > piece of code ? The Intel processor manual has an explicit example for this and recommends you use cpuid as a serializing instruction before the call to rdtsc. Basically you call cpuid + rdtsc a bunch of times to calibrate its average latency. Then do your run with cpuid + rdtsc to get the beginning and end clockstamp, subtract the two plus the latency you calculated above. This gives a good value for the cycles in your routine. Other factors like acpi can affect rdtsc so beware of this. -Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message