Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Jul 2012 14:35:13 -0400
From:      Jung-uk Kim <jkim@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Jim Harris <jimharris@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, Andriy Gapon <avg@freebsd.org>, Bruce Evans <brde@optusnet.com.au>, svn-src-head@freebsd.org
Subject:   Re: svn commit: r238755 - head/sys/x86/x86
Message-ID:  <50103C61.8040904@FreeBSD.org>
In-Reply-To: <20120725180537.GO2676@deviant.kiev.zoral.com.ua>
References:  <201207242210.q6OMACqV079603@svn.freebsd.org> <500F9E22.4080608@FreeBSD.org> <20120725102130.GH2676@deviant.kiev.zoral.com.ua> <500FE6AE.8070706@FreeBSD.org> <20120726001659.M5406@besplex.bde.org> <50102C94.9030706@FreeBSD.org> <20120725180537.GO2676@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-07-25 14:05:37 -0400, Konstantin Belousov wrote:
> On Wed, Jul 25, 2012 at 01:27:48PM -0400, Jung-uk Kim wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> On 2012-07-25 10:44:04 -0400, Bruce Evans wrote:
>>> On Wed, 25 Jul 2012, Andriy Gapon wrote:
>>> 
>>>> on 25/07/2012 13:21 Konstantin Belousov said the following:
>>>>> ... diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
>>>>> index 085c339..229b351 100644 --- a/sys/x86/x86/tsc.c +++ 
>>>>> b/sys/x86/x86/tsc.c @@ -594,6 +594,7 @@ static u_int 
>>>>> tsc_get_timecount(struct timecounter *tc __unused) {
>>>>> 
>>>>> +    rmb(); return (rdtsc32()); }
>>>> 
>>>> This makes sense to me.  We probably want correctness over 
>>>> performance here. [BTW, I originally thought that the change
>>>> was here; brain malfunction]
>>> 
>>> And I liked the original change because it wasn't here :-).
>>> 
>>>>> @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct
>>>>> timecounter *tc) { uint32_t rv;
>>>>> 
>>>>> +    rmb(); __asm __volatile("rdtsc; shrd %%cl, %%edx, %0"
>>>>> - : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
>>>>> + : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
>>>>> return (rv); }
>>>>> 
>>>> 
>>>> It would correct here too, but not sure if it would make any
>>>>  difference given that some lower bits are discarded anyway. 
>>>> Probably depends on exact CPU.
>>> 
>>> It is needed to pessimize this too. :-)
>>> 
>>> As I have complained before, the loss of resolution from the
>>> shift is easy to see by reading the time from userland, even
>>> with syscall overhead taking 10-20 times longer than the read.
>>> On core2 with TSC-low, a clock- checking utility gives:
>>> 
>>> % min 481, max 12031, mean 530.589452, std 51.633626 % 1th:
>>> 550 (1296487 observations) % 2th: 481 (448425 observations) %
>>> 3th: 482 (142650 observations) % 4th: 549 (61945 observations)
>>> % 5th: 551 (47619 observations)
>>> 
>>> The numbers are diffences in nanoseconds measured by 
>>> clock_gettime(). The jump from 481 to 549 is 68.  From this I
>>> can tell that the clock frequency is 1.86 Ghz and the shift is
>>> 128, or the clock frequency is 3.72 Ghz and the shift is 256.
>>> 
>>> On AthlonXP with TSC:
>>> 
>>> % min 273, max 29075, mean 274.412811, std 80.425963 % 1th:
>>> 273 (853962 observations) % 2th: 274 (745606 observations) %
>>> 3th: 275 (400212 observations) % 4th: 276 (20 observations) %
>>> 5th: 280 (10 observations)
>>> 
>>> Now the numbers cluster about the mean.  Although syscalls
>>> take much longer than the loss of resolution with TSC-low, and
>>> even the core2 TSC takes almost as long to read as the loss, it
>>> is still possible to see things happening at the limits of the
>>> resolution (~0.5 nsec).
>>> 
>>>> And, oh hmm, I read AMD Software Optimization Guide for AMD 
>>>> Family 10h Processors and they suggest using cpuid (with a
>>>> note that it may be intercepted in virtualized environments)
>>>> or _mfence_ in the discussed role (Appendix F of the
>>>> document). Googling for 'rdtsc mfence lfence' yields some
>>>> interesting results.
>>> 
>>> The second hit was for the shrd pessimization/loss of
>>> resolution and a memory access hack in lkml in 2011.  I now
>>> seem to remember jkim mentioning the memory access hack.  rmb()
>>> on i386 has a related memory access hack, but now with a lock
>>> prefix that defeats the point of the 2011 hack (it wanted to
>>> save 5 nsec by removing fences).  rmb() on amd64 uses lfence.
>> 
>> I believe I mentioned this thread at the time:
>> 
>> https://patchwork.kernel.org/patch/691712/
>> 
>> FYI, r238755 is essentially this commit for Linux:
>> 
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=93ce99e849433ede4ce8b410b749dc0cad1100b2
>>
>>>
>> 
Some of the other hits are a bit old.  The 8th one was by me in
>>> the thread about kib@ implementing gettimeofday() in userland.
>> 
>> Since we have gettimeofday() in userland, the above Linux thread
>> is more relevant now, I guess.
> 
> For some unrelated reasons, we do have lfence;rdtsc sequence in
> the userland already. Well, it is not exactly such sequence, there
> are some instructions between, but the main fact is that two
> consequtive invocations of gettimeofday(2) (*) or clock_gettime(2)
> are interleaved with lfence on Intels, guaranteeing that backstep
> of the counter is impossible.
> 
> * - it is not a syscall anymore.
> 
> As I said, using recommended mfence;rdtsc sequence for AMDs would
> require some work, but lets handle the kernel and userspace issues
> separately.

Agreed.

> And, I really failed to find what the patch from the thread you
> referenced tried to fix.

The patch was supposed to reduce a barrier, i.e., vsyscall
optimization.  Please note I brought it up at the time, not because it
fixed any problem but because we completely lack necessary serialization.

> Was it really committed into Linux ?

Yes, it was committed in a simpler form:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=057e6a8c660e95c3f4e7162e00e2fee1fc90c50d

This function was moved around from time to time and now it sits here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob_plain;f=arch/x86/vdso/vclock_gettime.c

It still carries one barrier before rdtsc.  Please see the comments.

> I see actual problem of us allowing timecounters going back, and a 
> solution that exactly follows words of both Intel and AMD
> documentation. This is good one step forward IMHO.

I agree with you here.  Correctness outweighs performance, IMHO.

Jung-uk Kim
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlAQPGEACgkQmlay1b9qnVOsqwCgvOBLBkbEW+IJhsJBtrXF0mD6
jGAAoL/pj530VFFKQqSGQ5vBpe8xVfQz
=0y/R
-----END PGP SIGNATURE-----



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50103C61.8040904>