From owner-svn-src-all@FreeBSD.ORG Wed Jun 8 20:55:37 2011 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B57A1106564A; Wed, 8 Jun 2011 20:55:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 3D0E78FC08; Wed, 8 Jun 2011 20:55:36 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p58KtX1w023649 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 9 Jun 2011 06:55:35 +1000 Date: Thu, 9 Jun 2011 06:55:33 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Jung-uk Kim In-Reply-To: <201106081938.p58JcWuB044252@svn.freebsd.org> Message-ID: <20110609055112.P2870@besplex.bde.org> References: <201106081938.p58JcWuB044252@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org Subject: Re: svn commit: r222866 - head/sys/x86/x86 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2011 20:55:37 -0000 On Wed, 8 Jun 2011, Jung-uk Kim wrote: > Log: > Introduce low-resolution TSC timecounter "TSC-low". It replaces the normal > TSC timecounter if TSC frequency is higher than ~4.29 MHz (or 2^32-1 Hz) or Off by a factor of 1024. > multiple CPUs are present. The "TSC-low" frequency is always lower than a > preset maximum value and derived from TSC frequency (by being halved until > it becomes lower than the maximum). Note the maximum value for SMP case is > significantly lower than UP case because we want to reduce (rare but known) > "temporal anomalies" caused by non-serialized RDTSC instruction. Normally, > it is still higher than "ACPI-fast" timecounter frequency (which was default > timecounter hardware for long time until r222222) to be useful. It should be a separate timecounter so that the user can choose it independently, at least in the SMP case where it is very low (at most ~4.29 GHz >> 8 ~= 17 MHz). > Modified: > head/sys/x86/x86/tsc.c > > Modified: head/sys/x86/x86/tsc.c > ============================================================================== > --- head/sys/x86/x86/tsc.c Wed Jun 8 19:28:59 2011 (r222865) > +++ head/sys/x86/x86/tsc.c Wed Jun 8 19:38:31 2011 (r222866) > @@ -79,7 +79,8 @@ static void tsc_freq_changed(void *arg, > int status); > static void tsc_freq_changing(void *arg, const struct cf_level *level, > int *status); > -static unsigned tsc_get_timecount(struct timecounter *tc); > +static unsigned tsc_get_timecount(struct timecounter *tc); > +static unsigned tsc_get_timecount_lowres(struct timecounter *tc); Should be spelled low or lowres throughout. > static void tsc_levels_changed(void *arg, int unit); > > static struct timecounter tsc_timecounter = { > @@ -392,11 +393,19 @@ test_smp_tsc(void) > static void > init_TSC_tc(void) This seems to only be called once at boot time. So the lowness may be much lower than necessary if the levels are reduced significantly later. > { > + uint64_t max_freq; > + int shift; > > if ((cpu_feature & CPUID_TSC) == 0 || tsc_disabled) > return; > > /* > + * Limit timecounter frequency to fit in an int and prevent it from > + * overflowing too fast. > + */ > + max_freq = UINT_MAX; > + > + /* > * We can not use the TSC if we support APM. Precise timekeeping > * on an APM'ed machine is at best a fools pursuit, since > * any and all of the time spent in various SMM code can't > @@ -418,13 +427,27 @@ init_TSC_tc(void) > * We can not use the TSC in SMP mode unless the TSCs on all CPUs are > * synchronized. If the user is sure that the system has synchronized > * TSCs, set kern.timecounter.smp_tsc tunable to a non-zero value. > + * We also limit the frequency even lower to avoid "temporal anomalies" > + * as much as possible. > */ > - if (smp_cpus > 1) > + if (smp_cpus > 1) { > tsc_timecounter.tc_quality = test_smp_tsc(); > + max_freq >>= 8; > + } This gives especially low lowness if the levels are reduced significantly. Maybe as low as 100 MHz >> 8 = ~390 KHz = lower than an i8254. OTOH, maybe the temporal anomalies scale with the TSC frequency, so you need to right shift by a few irrespective of the TSC frequency. A shift count of 8 seems too much, but if the initial TSC frequency is already < 2**32 shifted by 8, then the final shift is 0. > #endif > init: > + for (shift = 0; shift < 32 && (tsc_freq >> shift) > max_freq; shift++) > + ; > + if (shift > 0) { > + tsc_timecounter.tc_get_timecount = tsc_get_timecount_lowres; > + tsc_timecounter.tc_name = "TSC-low"; > + if (bootverbose) > + printf("TSC timecounter discards lower %d bit(s).\n", > + shift); Error messages are not terminated by a "." in KNF. > + } > if (tsc_freq != 0) { > - tsc_timecounter.tc_frequency = tsc_freq; > + tsc_timecounter.tc_frequency = tsc_freq >> shift; > + tsc_timecounter.tc_priv = (void *)(intptr_t)shift; > tc_init(&tsc_timecounter); > } > } > @@ -496,7 +519,8 @@ tsc_freq_changed(void *arg, const struct > /* Total setting for this level gives the new frequency in MHz. */ > freq = (uint64_t)level->total_set.freq * 1000000; > atomic_store_rel_64(&tsc_freq, freq); I thought you would scale the low-level tsc_freq too, but this is not so easy (the shift would have to be put in rdtsc()). > - atomic_store_rel_64(&tsc_timecounter.tc_frequency, freq); > + tsc_timecounter.tc_frequency = > + freq >> (int)(intptr_t)tsc_timecounter.tc_priv; Perhaps the levels can also be increased significantly later. Then the timecounter frequency may exceed 4.29 GHz despite its scaling. > } > > static int > @@ -511,7 +535,8 @@ sysctl_machdep_tsc_freq(SYSCTL_HANDLER_A > error = sysctl_handle_64(oidp, &freq, 0, req); > if (error == 0 && req->newptr != NULL) { > atomic_store_rel_64(&tsc_freq, freq); > - atomic_store_rel_64(&tsc_timecounter.tc_frequency, freq); > + tsc_timecounter.tc_frequency = > + freq >> (int)(intptr_t)tsc_timecounter.tc_priv; Root can easily increase this frequency from below 4.29 GHz (with no scaling) to above (still with no scaling). And this change might even be correct if it is from 4.29 GHz - epsilon to 4.29 GHz plus epsilon. Micro-adjustments that cross the 4.29 GHz threshold are precisely the ones that need the atomic op. > } > return (error); > } > @@ -520,8 +545,15 @@ SYSCTL_PROC(_machdep, OID_AUTO, tsc_freq > 0, 0, sysctl_machdep_tsc_freq, "QU", "Time Stamp Counter frequency"); > > static u_int > -tsc_get_timecount(struct timecounter *tc) > +tsc_get_timecount(struct timecounter *tc __unused) > { > > return (rdtsc32()); > } > + > +static u_int > +tsc_get_timecount_lowres(struct timecounter *tc) > +{ > + > + return (rdtsc() >> (int)(intptr_t)tc->tc_priv); This forces a slow 64-bit shift (shrdl; shrl) in all cases. rdtsc32() with a scaled tc_counter_mask should work OK (essentially the same as the non-low timecounter except for reduced accuracy; the only loss is an decrease in the time until counter overflow to the same as for the non-low timecounter). > +} > Blindly changing the scale factor on every change to tsc_freq would give some nice races, but ones that are essentially the same as for changing the timecounter hardware. This gives me an idea for another implementation: have an array of TSC timecounters with 1 element for each small shift count, and switch using the timecounter hardware switch code, and let the user select the element. Bruce