Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Jun 2011 22:05:06 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Jung-uk Kim <jkim@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r222866 - head/sys/x86/x86
Message-ID:  <20110618210815.W889@besplex.bde.org>
In-Reply-To: <201106081913.09272.jkim@FreeBSD.org>
References:  <201106081938.p58JcWuB044252@svn.freebsd.org> <20110609055112.P2870@besplex.bde.org> <201106081913.09272.jkim@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Long ago, On Wed, 8 Jun 2011, Jung-uk Kim wrote:

> On Wednesday 08 June 2011 04:55 pm, Bruce Evans wrote:
>> On Wed, 8 Jun 2011, Jung-uk Kim wrote:
>>> Log:
>>>  Introduce low-resolution TSC timecounter "TSC-low".  It replaces
>>> the normal TSC timecounter if TSC frequency is higher than ~4.29
>>> MHz (or 2^32-1 Hz) or
>
>> It should be a separate timecounter so that the user can choose it
>> independently, at least in the SMP case where it is very low (at
>> most ~4.29 GHz >> 8 ~= 17 MHz).
>
> As I noted in the log, it is still higher than the previous default
> ACPI-fast, which is ~3.68 MHz and I've never heard of any complaint
> about ACPI-fast being too low. ;-)

That's because it is too low to measure itself being low :-).

> Nothing prevents us from making a separate timecounter, though.  In
> fact, we can do the same for ACPI-fast/ACPI-safe.  However, that'll
> only confuse users, IMHO.

TSC/TSC-low sort of corresponds to ACPI-fast/ACPI-safe.  Users can
switch between the latter.  What they can't do is run both concurrently,
either to compare them or use the best one that works in the current
context.  That would be more developers and is not implemented mainly
because it has more complexity (only a tiny amount of extra overhead
I think, provided you don't try to keep the 2 times coherent -- just
an extra windup for each active timecounter).

>>> static void tsc_levels_changed(void *arg, int unit);
>>>
>>> static struct timecounter tsc_timecounter = {
>>> @@ -392,11 +393,19 @@ test_smp_tsc(void)
>>> static void
>>> init_TSC_tc(void)
>>
>> This seems to only be called once at boot time.  So the lowness may
>> be much lower than necessary if the levels are reduced
>> significantly later.
>
> It'll only happen when the CPU is started at the highest frequency and
> TSC is not invariant.  In this case, its quality will be set to 800
> and HPET or ACPI timecounter will be selected by default.  I don't
> see much problem with the default choice here.

Can the CPU be started at a low frequency and throttled up later?  I
agree that the non-invariant case is not very important.

>>> {
>>> +	uint64_t max_freq;
>>> +	int shift;
>>>
>>> 	if ((cpu_feature & CPUID_TSC) == 0 || tsc_disabled)
>>> 		return;
>>>
>>> 	/*
>>> +	 * Limit timecounter frequency to fit in an int and prevent it
>>> from +	 * overflowing too fast.
>>> +	 */
>>> +	max_freq = UINT_MAX;
>>> +
>>> +	/*
>>> 	 * We can not use the TSC if we support APM.  Precise
>>> timekeeping * on an APM'ed machine is at best a fools pursuit,
>>> since * any and all of the time spent in various SMM code can't
>>> @@ -418,13 +427,27 @@ init_TSC_tc(void)
>>> 	 * We can not use the TSC in SMP mode unless the TSCs on all
>>> CPUs are * synchronized.  If the user is sure that the system has
>>> synchronized * TSCs, set kern.timecounter.smp_tsc tunable to a
>>> non-zero value. +	 * We also limit the frequency even lower to
>>> avoid "temporal anomalies" +	 * as much as possible.
>>> 	 */
>>> -	if (smp_cpus > 1)
>>> +	if (smp_cpus > 1) {
>>> 		tsc_timecounter.tc_quality = test_smp_tsc();
>>> +		max_freq >>= 8;
>>> +	}
>>
>> This gives especially low lowness if the levels are reduced
>> significantly. Maybe as low as 100 MHz >> 8 = ~390 KHz = lower than
>> an i8254.
>
> I don't remember any SMP-capable x86 ever running at 100 MHz unless it
> is seriously under-clocked.  Even if it existed, it won't be
> available today. :-P

Doesn't throttling give underclocking?  Maybe not as low as 100 MHz, but
quite low.  Only a possible problem for the non-invariant case anyway.

>> OTOH, maybe the temporal anomalies scale with the TSC frequency, so
>> you need to right shift by a few irrespective of the TSC frequency.
>> A shift count of 8 seems too much, but if the initial TSC frequency
>> is already < 2**32 shifted by 8, then the final shift is 0.

This is my main point.  How can it be right to reduce the extra shift
for SMP (if this shift is needed at all) just because the initial TSC
frequency is low?  All instructions are clocked, so non-temporalness
within a core scales with the current frequency.  Oops, this leads
back to my previous point that the scaling should depend on the
current frequency and not just on the initial frequency.  Across
cores, it isn't so clear what the non-temporalness scales with.  The
non-temporalness is FUD so its scaling could be anything :-).

>> ...
>> Perhaps the levels can also be increased significantly later.  Then
>> the timecounter frequency may exceed 4.29 GHz despite its scaling.
>
> Again, it can only happen when the CPU was started at low frequency
> and the TSC is not invariant.  For that case, TSC won't be selected
> by default unless both HPET and ACPI timers are disabled/unavailable.

But users can select it, and since user's can't control the scaling
or even select between TSC/TSC-low, TSC-low must be scaled properly
initially to have the best chance of working later.

>>> @@ -520,8 +545,15 @@ SYSCTL_PROC(_machdep, OID_AUTO, tsc_freq
>>>     0, 0, sysctl_machdep_tsc_freq, "QU", "Time Stamp Counter
>>> frequency");
>>>
>>> static u_int
>>> -tsc_get_timecount(struct timecounter *tc)
>>> +tsc_get_timecount(struct timecounter *tc __unused)
>>> {
>>>
>>> 	return (rdtsc32());
>>> }
>>> +
>>> +static u_int
>>> +tsc_get_timecount_lowres(struct timecounter *tc)
>>> +{
>>> +
>>> +	return (rdtsc() >> (int)(intptr_t)tc->tc_priv);
>>
>> This forces a slow 64-bit shift (shrdl; shrl) in all cases.
>
> Yes, it does, unfortunately.
>
> I have no clue why AMD didn't implement native 64-bit RDTSC (and
> RDMSR/WRMSR) in the first place. :-(

I didn't notice before that it still goes to a register pair on amd64.

>> rdtsc32() with a scaled tc_counter_mask should work OK (essentially
>> the same as the non-low timecounter except for reduced accuracy;
>> the only loss is an decrease in the time until counter overflow to
>> the same as for the non-low timecounter).
>
> I thought about that but I didn't like that idea, i.e., losing
> resolution and accuracy at the same time.

But it doesn't lose any more resolution or accuracy than any shift
necessarily uses.  It only loses wrap time, which is of no interest
for a small reduction.  See another reply.

The shift of 8 for SMP still seems far too much.  clock_gettime() with
a TSC timecounter on an old 2GHz system takes about 250 nS.  I hope
it takes only 1/2 that on a newer system.  nanouptime() in the kernel
takes more like 30 nS on the old system.  It should at least try to
have enough resulution for sequential calls to it to never return the
same time (even ACPI-fast has this property -- about 1000 nS per call
and a resolution of about 250 nS).  rdtsc on old Athlons takes only
12 (9?) cycles so you could almost use it to time individual instructions
(modulo out of order execution).  THe invariant versions have to be
much slower for synchronization :-(.  They take at least 42 cycles
AFAIR.  A shift count of 5 would lose less resolution than an invariant
TSC really has so it would be good if it is enough to hide the
nontemporalness.  A shift count of 6 would be OK too.  But a shift
count of 8 lets you execute about 4 nanouptime()'s for every change
in the time returned.  OTOH, 256 cycles at 4 GHz is about 64 uS and
clock_gettime() unfortunately takes longer (except on Linux? :-(), so
a shift count of 8 is OK for it.

My clock measurement program (mostly an old program by Wollman) shows
the following histogram of times for a non-invariant TSC timecounter
on a 2GHz UP system:

% min 273, max 265102, mean 273.998217, std 79.069534
% 1th: 273 (1727219 observations)
% 2th: 274 (265607 observations)
% 3th: 275 (6984 observations)
% 4th: 280 (11 observations)
% 5th: 290 (8 observations)

The variance is small, and differences of a single nS can be seen clearly.
With the SMP shift of 8 on a 4GHz system, the minimum difference would
be 64 nS so it would be impossible to see the details of the distribution
about the mean of 273.998 nS.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110618210815.W889>