Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Apr 2011 05:17:13 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Jung-uk Kim <jkim@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r220433 - in head/sys: amd64/amd64 cddl/dev/dtrace/amd64 cddl/dev/dtrace/i386 compat/linprocfs contrib/altq/altq dev/acpica i386/i386 i386/isa pc98/pc98 x86/cpufreq x86/isa x86/x86
Message-ID:  <20110409051607.I963@besplex.bde.org>
In-Reply-To: <201104081317.41254.jkim@FreeBSD.org>
References:  <201104072328.p37NSSsO055101@svn.freebsd.org> <20110408224855.Y1390@besplex.bde.org> <201104081317.41254.jkim@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 8 Apr 2011, Jung-uk Kim wrote:

> On Friday 08 April 2011 10:54 am, Bruce Evans wrote:
>> On Thu, 7 Apr 2011, Jung-uk Kim wrote:
>>> Log:
>>>  Use atomic load & store for TSC frequency.  It may be overkill
>>> for amd64 but safer for i386 because it can be easily over 4 GHz
>>> now.  More worse, it can be easily changed by user with
>>> 'machdep.tsc_freq' tunable (directly) or cpufreq(4) (indirectly).
>>> ...
>>
>> I disagree with this complexity and churn, and tried to avoid it --
>> see previous mails for more details.
>
> I don't like it myself.  It's sort of my way of expressing "do not use
> 64-bit global variables on 32-bit platforms unless the architecture
> guarantees coherency". ;-)
>
> Seriously, using variant TSC is strongly discouraged for timekeeping.
> Messing up with it from user land is worse.  If it is really
> necessary, it means kernel is broken and we need to fix it.

No, changing it from userland is the correct method.  It is to fix up
the kernel's miscalibration of the TSC (variant or otherwise).  Userland
can use long-running test where the kernel can only afford to do a
short sloppy test at boot time.  The kernel can use long-running tests
after boot time, but this gives complexity in the kernel.

>> There are probably many 64-bit variables that need to be accessed
>> more atomically, and tsc_freq is one of the least in need of atomic
>> accesses, since the race windows are especially tiny for it.  Even
>> when the race is lost, this is harmless except for frequencies that
>> are almost exactly 2^32 Hz, since only sysctls that change the
>> frequency from above this to below, or vice versa, can give an
>> inconsistent value. Thus there is a race window of a few nsec which
>> only matters with a probability of about 1 in 2^32, only for
>> user[s] of the sysctl to change the TSC frequency.
>
> powerd(8) via cpufreq(4) can change tsc_freq frequently if the CPU is
> not P-state invariant.  Well, the chances of getting ~4.3GHz CPU
> without P-state invariant TSC is little low, I guess.

:-).

Other changes to tsc_freq also break tuning by the sysctl.  My tuning
from userland actually determines the frequency by applying a constant
scale factor to a variable frequency from a table, but I only use this
on old machine which don't have cpufreq(4) etc., so I only do it once
at boot time and don't have to worry about cpufreq(4) messing it up.
(The frequency can be changed across boots by either reconfiguring the
BIOS, or on one machine, by using a CPU with a different frequency.
The scale factor for all this is constant because all clocks are derived
from a single clock in hardware.)  Applying a constant scale factor
to the frequencies set by cpufreq(4) should work in the same way, but
you have to know these frequencies accurately and could just use the
actual freqencies.

>> The complexity of this shows that you should try hard not to use
>> 64-bit variables on 32-bit systems if accesses to them are not
>> locked by a mutex.  64-bit systems like amd64 don't need to do
>> anything special, any more than 32-bit systems accessing 32-bit
>> variables.  We assume that accesses of <= the bus width are atomic
>> in a certain weak sense which is enough for variables like
>> tsc_freq.
>
> Actually, Pentium and above guarantees the "weak" atomic memory
> read/write when it is aligned to quad-word boundary.  CMPXCHG8B is
> the only one instruction (except for FPU) useful for that feature,
> though.  I considered implementing weaker version initially but I am
> not sure if it has any benefit.

I think this is a bit x86-specific, and having to do anything that
the compiler won't do automatically makes it too hard to use for an
average variable.  Does just using aligned FPU/MMX/XMM instructions
give atomic operations?

>> I thought of the following KPI-transparent method that can be used
>> for any struct if updates need not be seen immediately.  It is a
>> refined version of timehands method used for timecounters:
>>
>>  	#define	tsc_freq	_tsc_freq[_tsc_gen]
>
> It just hides the implementation.  How about this, then?
>
> #define	tsc_freq	atomic_load_acq_64(&_tsc_freq)
>
> :-P

Not quite the same, since the atomic op is heavyweight.  You used
lots of extra code to avoid multiple accesses to it.  The array
access is only slightly heavier weight than a variable access.
If there are multiple accesses, then _tsc_gen if not the whole
_tsc_freq[_tsc_gen] should be cached in a register.  Even extra
writes to these variables, and sfences, aren't a problem with 
my version, since they are only done in the unusual case where
the high bits are changed.

>> where the everything is updated sufficiently atomically and the
>> object data is guaranteed not to change for a suitably long time
>> after the generation that indexes it is read:
>>
>>  	if ((new_tsc_freq >> 32) == (tsc_freq >> 32)) {
>>  		/*
>>  		 * Usual case.  Below 2**32, or upper bits just not
>>  		 * changing.  No need for complications.
>>  		 */
>>  		tsc_freq = new_tsc_freq;
>>  	else if ((new_tsc_freq & 0xFFFFFFFF) == (tsc_freq & 0xFFFFFFFF))
>> { /*
>>  		 * Upper bits changing but lower bits not.  Very unusual,
>>  		 * and optimized here to give a place for this comment.
>>  		 * Again, no need for complications.
>>  		 */
>>  		tsc_freq = new_tsc_freq;
>>  	} else {

Perhaps your version can avoid the atomic op for the usual case of writing,
as above, but this isn't very useful since most accesses are reads which
need it.

>> ...
>
> You see the complexity is still there and I don't like it.

Yes, all this is a waste of time for just tsc_freq.  It may be needed
for other variables.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110409051607.I963>