Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Jul 2012 17:35:23 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Jung-uk Kim <jkim@FreeBSD.org>
Cc:        Jim Harris <jimharris@FreeBSD.org>, src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, Andriy Gapon <avg@FreeBSD.org>, svn-src-head@FreeBSD.org, Bruce Evans <brde@optusnet.com.au>, Konstantin Belousov <kostikbel@gmail.com>
Subject:   Re: svn commit: r238755 - head/sys/x86/x86
Message-ID:  <20120726170837.Q2536@besplex.bde.org>
In-Reply-To: <50103C61.8040904@FreeBSD.org>
References:  <201207242210.q6OMACqV079603@svn.freebsd.org> <500F9E22.4080608@FreeBSD.org> <20120725102130.GH2676@deviant.kiev.zoral.com.ua> <500FE6AE.8070706@FreeBSD.org> <20120726001659.M5406@besplex.bde.org> <50102C94.9030706@FreeBSD.org> <20120725180537.GO2676@deviant.kiev.zoral.com.ua> <50103C61.8040904@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 25 Jul 2012, Jung-uk Kim wrote:

> On 2012-07-25 14:05:37 -0400, Konstantin Belousov wrote:
>>> Since we have gettimeofday() in userland, the above Linux thread
>>> is more relevant now, I guess.

Indeed.  syscalls put squillions of instructions between.  Maybe even
a serialization instruction.

>> For some unrelated reasons, we do have lfence;rdtsc sequence in
>> the userland already. Well, it is not exactly such sequence, there
>> are some instructions between, but the main fact is that two
>> consequtive invocations of gettimeofday(2) (*) or clock_gettime(2)
>> are interleaved with lfence on Intels, guaranteeing that backstep
>> of the counter is impossible.

In fact, there is always a full documented serialization instruction
for syscalls, except maybe in FreeBSD-1 compat code on i386, at
least on Athlon64.  i386 syscalls use int 0x80 (except in FreeBSD-1
compat code they use lcalls, and the iret necessary to return from
this is serializing on at least Athlon64.  amd64 syscalls use
sysenter/sysret.  sysret isn't serializing (like far returns), at least
on Athlon64, but at least in FreeBSD, the syscall implementation uses
at least 2 swapgs's (one on entry and one just before the sysret), and
swapgs is serializing, at least on Athlon64.

>> * - it is not a syscall anymore.
>>
>> As I said, using recommended mfence;rdtsc sequence for AMDs would
>> require some work, but lets handle the kernel and userspace issues
>> separately.

Benchmarks for various methods on AthlonXP: I started with a program
that loops making a fe million clock_gettime() calls:

     unchanged program: 1.15 seconds
     add lfence:        1.16 seconds
     add mfence:        1.15 seconds (yes, faster than mfence)
     add atomic_cmpset: 1.20 seconds
     add cpuid:         1.25 seconds

>> And, I really failed to find what the patch from the thread you
>> referenced tried to fix.
>
> The patch was supposed to reduce a barrier, i.e., vsyscall
> optimization.  Please note I brought it up at the time, not because it
> fixed any problem but because we completely lack necessary serialization.
>
>> Was it really committed into Linux ?
>
> Yes, it was committed in a simpler form:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=057e6a8c660e95c3f4e7162e00e2fee1fc90c50d
>
> This function was moved around from time to time and now it sits here:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob_plain;f=arch/x86/vdso/vclock_gettime.c
>
> It still carries one barrier before rdtsc.  Please see the comments.

For safety, you probably need to use the slowest (cpuid) method.  Linux
seems to be just using fences that are observed to work.

Original Athlon64 manuals say this about rdtsc: "... not serializing...
even when bound by serializing instructions, the system environment at
the time the instruction is executed can cause additional cycles
[before it reaches EDX:EAX]".

With multiple CPUs, the hardware would have to be smarter and might need
more or different serialization instructions so that these additional
cycles don't break monotonicity across all CPUs.

>> I see actual problem of us allowing timecounters going back, and a
>> solution that exactly follows words of both Intel and AMD
>> documentation. This is good one step forward IMHO.
>
> I agree with you here.  Correctness outweighs performance, IMHO.

Use an i8254 then :-).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120726170837.Q2536>