Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Jun 2011 19:13:06 -0400
From:      Jung-uk Kim <jkim@FreeBSD.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r222866 - head/sys/x86/x86
Message-ID:  <201106081913.09272.jkim@FreeBSD.org>
In-Reply-To: <20110609055112.P2870@besplex.bde.org>
References:  <201106081938.p58JcWuB044252@svn.freebsd.org> <20110609055112.P2870@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 08 June 2011 04:55 pm, Bruce Evans wrote:
> On Wed, 8 Jun 2011, Jung-uk Kim wrote:
> > Log:
> >  Introduce low-resolution TSC timecounter "TSC-low".  It replaces
> > the normal TSC timecounter if TSC frequency is higher than ~4.29
> > MHz (or 2^32-1 Hz) or
>
> Off by a factor of 1024.

Oops, GHz.  Sorry.

> >  multiple CPUs are present.  The "TSC-low" frequency is always
> > lower than a preset maximum value and derived from TSC frequency
> > (by being halved until it becomes lower than the maximum).  Note
> > the maximum value for SMP case is significantly lower than UP
> > case because we want to reduce (rare but known) "temporal
> > anomalies" caused by non-serialized RDTSC instruction.  Normally,
> > it is still higher than "ACPI-fast" timecounter frequency (which
> > was default timecounter hardware for long time until r222222) to
> > be useful.
>
> It should be a separate timecounter so that the user can choose it
> independently, at least in the SMP case where it is very low (at
> most ~4.29 GHz >> 8 ~= 17 MHz).

As I noted in the log, it is still higher than the previous default 
ACPI-fast, which is ~3.68 MHz and I've never heard of any complaint 
about ACPI-fast being too low. ;-)

Nothing prevents us from making a separate timecounter, though.  In 
fact, we can do the same for ACPI-fast/ACPI-safe.  However, that'll 
only confuse users, IMHO.

> > Modified:
> >  head/sys/x86/x86/tsc.c
> >
> > Modified: head/sys/x86/x86/tsc.c
> > =================================================================
> >============= --- head/sys/x86/x86/tsc.c	Wed Jun  8 19:28:59
> > 2011	(r222865) +++ head/sys/x86/x86/tsc.c	Wed Jun  8 19:38:31
> > 2011	(r222866) @@ -79,7 +79,8 @@ static void
> > tsc_freq_changed(void *arg, int status);
> > static void tsc_freq_changing(void *arg, const struct cf_level
> > *level, int *status);
> > -static	unsigned tsc_get_timecount(struct timecounter *tc);
> > +static unsigned tsc_get_timecount(struct timecounter *tc);
> > +static unsigned tsc_get_timecount_lowres(struct timecounter
> > *tc);
>
> Should be spelled low or lowres throughout.

Hmm...  Maybe...

> > static void tsc_levels_changed(void *arg, int unit);
> >
> > static struct timecounter tsc_timecounter = {
> > @@ -392,11 +393,19 @@ test_smp_tsc(void)
> > static void
> > init_TSC_tc(void)
>
> This seems to only be called once at boot time.  So the lowness may
> be much lower than necessary if the levels are reduced
> significantly later.

It'll only happen when the CPU is started at the highest frequency and 
TSC is not invariant.  In this case, its quality will be set to 800 
and HPET or ACPI timecounter will be selected by default.  I don't 
see much problem with the default choice here.

> > {
> > +	uint64_t max_freq;
> > +	int shift;
> >
> > 	if ((cpu_feature & CPUID_TSC) == 0 || tsc_disabled)
> > 		return;
> >
> > 	/*
> > +	 * Limit timecounter frequency to fit in an int and prevent it
> > from +	 * overflowing too fast.
> > +	 */
> > +	max_freq = UINT_MAX;
> > +
> > +	/*
> > 	 * We can not use the TSC if we support APM.  Precise
> > timekeeping * on an APM'ed machine is at best a fools pursuit,
> > since * any and all of the time spent in various SMM code can't
> > @@ -418,13 +427,27 @@ init_TSC_tc(void)
> > 	 * We can not use the TSC in SMP mode unless the TSCs on all
> > CPUs are * synchronized.  If the user is sure that the system has
> > synchronized * TSCs, set kern.timecounter.smp_tsc tunable to a
> > non-zero value. +	 * We also limit the frequency even lower to
> > avoid "temporal anomalies" +	 * as much as possible.
> > 	 */
> > -	if (smp_cpus > 1)
> > +	if (smp_cpus > 1) {
> > 		tsc_timecounter.tc_quality = test_smp_tsc();
> > +		max_freq >>= 8;
> > +	}
>
> This gives especially low lowness if the levels are reduced
> significantly. Maybe as low as 100 MHz >> 8 = ~390 KHz = lower than
> an i8254.

I don't remember any SMP-capable x86 ever running at 100 MHz unless it 
is seriously under-clocked.  Even if it existed, it won't be 
available today. :-P

> OTOH, maybe the temporal anomalies scale with the TSC frequency, so
> you need to right shift by a few irrespective of the TSC frequency.
> A shift count of 8 seems too much, but if the initial TSC frequency
> is already < 2**32 shifted by 8, then the final shift is 0.
>
> > #endif
> > init:
> > +	for (shift = 0; shift < 32 && (tsc_freq >> shift) > max_freq;
> > shift++) +		;
> > +	if (shift > 0) {
> > +		tsc_timecounter.tc_get_timecount = tsc_get_timecount_lowres;
> > +		tsc_timecounter.tc_name = "TSC-low";
> > +		if (bootverbose)
> > +			printf("TSC timecounter discards lower %d bit(s).\n",
> > +			    shift);
>
> Error messages are not terminated by a "." in KNF.

I'll fix that next time, thanks.

> > +	}
> > 	if (tsc_freq != 0) {
> > -		tsc_timecounter.tc_frequency = tsc_freq;
> > +		tsc_timecounter.tc_frequency = tsc_freq >> shift;
> > +		tsc_timecounter.tc_priv = (void *)(intptr_t)shift;
> > 		tc_init(&tsc_timecounter);
> > 	}
> > }
> > @@ -496,7 +519,8 @@ tsc_freq_changed(void *arg, const struct
> > 	/* Total setting for this level gives the new frequency in MHz.
> > */ freq = (uint64_t)level->total_set.freq * 1000000;
> > 	atomic_store_rel_64(&tsc_freq, freq);
>
> I thought you would scale the low-level tsc_freq too, but this is
> not so easy (the shift would have to be put in rdtsc()).

Yup, it was far too ugly for my taste.

> > -	atomic_store_rel_64(&tsc_timecounter.tc_frequency, freq);
> > +	tsc_timecounter.tc_frequency =
> > +	    freq >> (int)(intptr_t)tsc_timecounter.tc_priv;
>
> Perhaps the levels can also be increased significantly later.  Then
> the timecounter frequency may exceed 4.29 GHz despite its scaling.

Again, it can only happen when the CPU was started at low frequency 
and the TSC is not invariant.  For that case, TSC won't be selected 
by default unless both HPET and ACPI timers are disabled/unavailable.

Please note the whole point of this commit (and r222867) is to make 
the default selection good enough for most users.  If a user decided 
to turn off other timers, let him/her suffer the consequences. ;-)

> > }
> >
> > static int
> > @@ -511,7 +535,8 @@ sysctl_machdep_tsc_freq(SYSCTL_HANDLER_A
> > 	error = sysctl_handle_64(oidp, &freq, 0, req);
> > 	if (error == 0 && req->newptr != NULL) {
> > 		atomic_store_rel_64(&tsc_freq, freq);
> > -		atomic_store_rel_64(&tsc_timecounter.tc_frequency, freq);
> > +		tsc_timecounter.tc_frequency =
> > +		    freq >> (int)(intptr_t)tsc_timecounter.tc_priv;
>
> Root can easily increase this frequency from below 4.29 GHz (with
> no scaling) to above (still with no scaling). And this change might
> even be correct if it is from 4.29 GHz - epsilon to 4.29 GHz plus
> epsilon.  Micro-adjustments that cross the 4.29 GHz threshold are
> precisely the ones that need the atomic op. 

Hmm...  I thought I took care of that but you're right.  The atomic op 
should be back.

> > 	}
> > 	return (error);
> > }
> > @@ -520,8 +545,15 @@ SYSCTL_PROC(_machdep, OID_AUTO, tsc_freq
> >     0, 0, sysctl_machdep_tsc_freq, "QU", "Time Stamp Counter
> > frequency");
> >
> > static u_int
> > -tsc_get_timecount(struct timecounter *tc)
> > +tsc_get_timecount(struct timecounter *tc __unused)
> > {
> >
> > 	return (rdtsc32());
> > }
> > +
> > +static u_int
> > +tsc_get_timecount_lowres(struct timecounter *tc)
> > +{
> > +
> > +	return (rdtsc() >> (int)(intptr_t)tc->tc_priv);
>
> This forces a slow 64-bit shift (shrdl; shrl) in all cases.

Yes, it does, unfortunately.

I have no clue why AMD didn't implement native 64-bit RDTSC (and 
RDMSR/WRMSR) in the first place. :-(

> rdtsc32() with a scaled tc_counter_mask should work OK (essentially
> the same as the non-low timecounter except for reduced accuracy;
> the only loss is an decrease in the time until counter overflow to
> the same as for the non-low timecounter).

I thought about that but I didn't like that idea, i.e., losing 
resolution and accuracy at the same time.

> > +}
>
> Blindly changing the scale factor on every change to tsc_freq would
> give some nice races, but ones that are essentially the same as for
> changing the timecounter hardware.  This gives me an idea for
> another implementation: have an array of TSC timecounters with 1
> element for each small shift count, and switch using the
> timecounter hardware switch code, and let the user select the
> element.

8-)

Jung-uk Kim



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201106081913.09272.jkim>