From owner-svn-src-all@FreeBSD.ORG Thu Jul 26 08:02:38 2012 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 488D81065675; Thu, 26 Jul 2012 08:02:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au [211.29.133.218]) by mx1.freebsd.org (Postfix) with ESMTP id D1ED98FC12; Thu, 26 Jul 2012 08:02:37 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q6Q82Sic007783 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 26 Jul 2012 18:02:29 +1000 Date: Thu, 26 Jul 2012 18:02:28 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov In-Reply-To: <20120725173212.GN2676@deviant.kiev.zoral.com.ua> Message-ID: <20120726174611.N2603@besplex.bde.org> References: <201207242210.q6OMACqV079603@svn.freebsd.org> <500F9E22.4080608@FreeBSD.org> <20120725102130.GH2676@deviant.kiev.zoral.com.ua> <20120725233033.N5406@besplex.bde.org> <20120725173212.GN2676@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Jim Harris , src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, Andriy Gapon , Bruce Evans , svn-src-head@FreeBSD.org Subject: Re: svn commit: r238755 - head/sys/x86/x86 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jul 2012 08:02:38 -0000 On Wed, 25 Jul 2012, Konstantin Belousov wrote: > On Thu, Jul 26, 2012 at 12:15:54AM +1000, Bruce Evans wrote: >> On Wed, 25 Jul 2012, Konstantin Belousov wrote: >> ... >> Most uses in FreeBSD are for timecounters. Timecounters deliver the >> current time. This is unrelated to whatever instructions haven't >> completed when the TSC is read. Except possibly when the time needs >> to be synchronized across CPUs, and when the uncompleted instruction >> is a TSC read. >> >>> For tsc test, this means that after the change RDTSC executions are not >>> reordered on the single core among themself. As I understand, CPU has >>> no dependency noted between two reads of tsc by RDTSC, which allows >>> later read to give lower value of counter. >> >> Gak. Even when they are in the same instruction sequence? Even though >> the TSC reads fixed registers and some other instructions in the sequence >> between the TSC use these registers? The CPU would have to do significant >> register renaming to break this. > As I could only speculate, I believe that any modern CPU executes RDTSC > as at least two separate steps, one is read from internal counter, and > second is the registers update. It seems that the first kind of action > is not serialized. I have no other explanation for the Jim findings. In a reply to your later mail (made earlier), I quoted the Athlon64 manual documenting this problem (everything except exactly where the serialization is applied). The delay is similar to what happens in software if the thread is preempted between reading the hardware time and using the result. It doesn't help to serializing the read and the use without serializing everything between, which costs more. Most uses don't care about the delay (else they need more than serialization to limit it). But if we care then we might have to use a slow new instruction like rdtscp to tell the hardware to care, or add slow locking to uses of the result in software (needs more than critical_enter() to stop fast interrupt handlers. BTW, binuptime() is supposed to work in fast interrupt handlers. This is fragile but useful). >>> { >>> >>> + rmb(); >>> return (rdtsc32()); >>> } >> >> Please don't pessimize this further. The time for rdtsc went from 6.5 >> cycles on AthlonXP to 65 cycles on core2 (mainly for for >> P-state-invariance hardware synchronization I think). Pretty soon it >> will be as slow as an HPET and heading towards an i8254. Adding rmb() >> only makes it 12 cycles slower on core2, but 16 cycles (almost 3 times) >> slower on AthlonXP. > AthlonXP does not look as interesting target for optimizations. Fom what I > can find this is PIII-era CPU. Since CPUs hit the frequency wall just after AthlonXP, it is almost as fast as a single modern CPU. Much faster than a modern CPU for rdtsc, and already optimized. Probably much faster than a PIII for systemy things like rdtsc. Bruce