From owner-svn-src-all@FreeBSD.ORG Wed Jul 25 18:35:15 2012 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 205CD1065677; Wed, 25 Jul 2012 18:35:15 +0000 (UTC) (envelope-from jkim@FreeBSD.org) Received: from hammer.pct.niksun.com (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 301738FC14; Wed, 25 Jul 2012 18:35:14 +0000 (UTC) Message-ID: <50103C61.8040904@FreeBSD.org> Date: Wed, 25 Jul 2012 14:35:13 -0400 From: Jung-uk Kim User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:13.0) Gecko/20120626 Thunderbird/13.0.1 MIME-Version: 1.0 To: Konstantin Belousov References: <201207242210.q6OMACqV079603@svn.freebsd.org> <500F9E22.4080608@FreeBSD.org> <20120725102130.GH2676@deviant.kiev.zoral.com.ua> <500FE6AE.8070706@FreeBSD.org> <20120726001659.M5406@besplex.bde.org> <50102C94.9030706@FreeBSD.org> <20120725180537.GO2676@deviant.kiev.zoral.com.ua> In-Reply-To: <20120725180537.GO2676@deviant.kiev.zoral.com.ua> X-Enigmail-Version: 1.4.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Jim Harris , src-committers@freebsd.org, svn-src-all@freebsd.org, Andriy Gapon , Bruce Evans , svn-src-head@freebsd.org Subject: Re: svn commit: r238755 - head/sys/x86/x86 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jul 2012 18:35:15 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-07-25 14:05:37 -0400, Konstantin Belousov wrote: > On Wed, Jul 25, 2012 at 01:27:48PM -0400, Jung-uk Kim wrote: >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> >> On 2012-07-25 10:44:04 -0400, Bruce Evans wrote: >>> On Wed, 25 Jul 2012, Andriy Gapon wrote: >>> >>>> on 25/07/2012 13:21 Konstantin Belousov said the following: >>>>> ... diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c >>>>> index 085c339..229b351 100644 --- a/sys/x86/x86/tsc.c +++ >>>>> b/sys/x86/x86/tsc.c @@ -594,6 +594,7 @@ static u_int >>>>> tsc_get_timecount(struct timecounter *tc __unused) { >>>>> >>>>> + rmb(); return (rdtsc32()); } >>>> >>>> This makes sense to me. We probably want correctness over >>>> performance here. [BTW, I originally thought that the change >>>> was here; brain malfunction] >>> >>> And I liked the original change because it wasn't here :-). >>> >>>>> @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct >>>>> timecounter *tc) { uint32_t rv; >>>>> >>>>> + rmb(); __asm __volatile("rdtsc; shrd %%cl, %%edx, %0" >>>>> - : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); >>>>> + : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); >>>>> return (rv); } >>>>> >>>> >>>> It would correct here too, but not sure if it would make any >>>> difference given that some lower bits are discarded anyway. >>>> Probably depends on exact CPU. >>> >>> It is needed to pessimize this too. :-) >>> >>> As I have complained before, the loss of resolution from the >>> shift is easy to see by reading the time from userland, even >>> with syscall overhead taking 10-20 times longer than the read. >>> On core2 with TSC-low, a clock- checking utility gives: >>> >>> % min 481, max 12031, mean 530.589452, std 51.633626 % 1th: >>> 550 (1296487 observations) % 2th: 481 (448425 observations) % >>> 3th: 482 (142650 observations) % 4th: 549 (61945 observations) >>> % 5th: 551 (47619 observations) >>> >>> The numbers are diffences in nanoseconds measured by >>> clock_gettime(). The jump from 481 to 549 is 68. From this I >>> can tell that the clock frequency is 1.86 Ghz and the shift is >>> 128, or the clock frequency is 3.72 Ghz and the shift is 256. >>> >>> On AthlonXP with TSC: >>> >>> % min 273, max 29075, mean 274.412811, std 80.425963 % 1th: >>> 273 (853962 observations) % 2th: 274 (745606 observations) % >>> 3th: 275 (400212 observations) % 4th: 276 (20 observations) % >>> 5th: 280 (10 observations) >>> >>> Now the numbers cluster about the mean. Although syscalls >>> take much longer than the loss of resolution with TSC-low, and >>> even the core2 TSC takes almost as long to read as the loss, it >>> is still possible to see things happening at the limits of the >>> resolution (~0.5 nsec). >>> >>>> And, oh hmm, I read AMD Software Optimization Guide for AMD >>>> Family 10h Processors and they suggest using cpuid (with a >>>> note that it may be intercepted in virtualized environments) >>>> or _mfence_ in the discussed role (Appendix F of the >>>> document). Googling for 'rdtsc mfence lfence' yields some >>>> interesting results. >>> >>> The second hit was for the shrd pessimization/loss of >>> resolution and a memory access hack in lkml in 2011. I now >>> seem to remember jkim mentioning the memory access hack. rmb() >>> on i386 has a related memory access hack, but now with a lock >>> prefix that defeats the point of the 2011 hack (it wanted to >>> save 5 nsec by removing fences). rmb() on amd64 uses lfence. >> >> I believe I mentioned this thread at the time: >> >> https://patchwork.kernel.org/patch/691712/ >> >> FYI, r238755 is essentially this commit for Linux: >> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=93ce99e849433ede4ce8b410b749dc0cad1100b2 >> >>> >> Some of the other hits are a bit old. The 8th one was by me in >>> the thread about kib@ implementing gettimeofday() in userland. >> >> Since we have gettimeofday() in userland, the above Linux thread >> is more relevant now, I guess. > > For some unrelated reasons, we do have lfence;rdtsc sequence in > the userland already. Well, it is not exactly such sequence, there > are some instructions between, but the main fact is that two > consequtive invocations of gettimeofday(2) (*) or clock_gettime(2) > are interleaved with lfence on Intels, guaranteeing that backstep > of the counter is impossible. > > * - it is not a syscall anymore. > > As I said, using recommended mfence;rdtsc sequence for AMDs would > require some work, but lets handle the kernel and userspace issues > separately. Agreed. > And, I really failed to find what the patch from the thread you > referenced tried to fix. The patch was supposed to reduce a barrier, i.e., vsyscall optimization. Please note I brought it up at the time, not because it fixed any problem but because we completely lack necessary serialization. > Was it really committed into Linux ? Yes, it was committed in a simpler form: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=057e6a8c660e95c3f4e7162e00e2fee1fc90c50d This function was moved around from time to time and now it sits here: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob_plain;f=arch/x86/vdso/vclock_gettime.c It still carries one barrier before rdtsc. Please see the comments. > I see actual problem of us allowing timecounters going back, and a > solution that exactly follows words of both Intel and AMD > documentation. This is good one step forward IMHO. I agree with you here. Correctness outweighs performance, IMHO. Jung-uk Kim -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlAQPGEACgkQmlay1b9qnVOsqwCgvOBLBkbEW+IJhsJBtrXF0mD6 jGAAoL/pj530VFFKQqSGQ5vBpe8xVfQz =0y/R -----END PGP SIGNATURE-----