Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 Mar 2016 10:49:03 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        John Baldwin <jhb@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org, "'rstone@freebsd.org'" <rstone@freebsd.org>
Subject:   Re: svn commit: r297039 - head/sys/x86/x86
Message-ID:  <20160325084902.GH1741@kib.kiev.ua>
In-Reply-To: <20160325060901.N2059@besplex.bde.org>
References:  <201603181948.u2IJmndg063765@repo.freebsd.org> <1866602.Bp7VFd5f42@ralph.baldwin.cx> <20160323075842.GX1741@kib.kiev.ua> <2922763.uITxoCVqGR@ralph.baldwin.cx> <20160324090917.GC1741@kib.kiev.ua> <20160325010649.H898@besplex.bde.org> <20160324162447.GD1741@kib.kiev.ua> <20160325060901.N2059@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 25, 2016 at 07:13:54AM +1100, Bruce Evans wrote:
> On Thu, 24 Mar 2016, Konstantin Belousov wrote:
[Skipped lock adaptive spinning text for now].

> 
> >> My systems allow speed variations of about 4000:800 = 5:1 for one CPU and
> >> about 50:1 for different CPUs.  So the old method gave a variation of up
> >> to 50:1.  This can be reduced to only 5:1 using the boot-time calibration.
> > What do you mean by 'for different CPUs' ?  I understand that modern ESS
> > can give us CPU frequency between 800-4200MHz, which is what you mean
> > by 'for one CPU'.  We definitely do not care if 5usec timeout becomes
> > 25usecs, since we practically never time-out there at all.
> 
> Yes, I actually get 4400:800 on i4790K.
> 
> The ratio is even larger than that with a hard-coded limit because old
> CPUs are much slower than i4790K.  I sometimes run a 367 MHz (P2 class)
> CPU.  It is several times slower than a new CPU at the same clock
> frequency, and any throttling would make it even slower.
> 
> 50 times slower means that a reasonable emergency timeout of 60 seconds
> becomes 3000 seconds.  Local users would get tired of waiting and reset,
> and remote users might have to wait.
But you do not downclock a machine booted at the 4.0Ghz datasheet clock,
down to 367Mhz. For 400Mhz P2 machine, LAPIC would be calibrated at that
400Mhz rate.

> There is another thread about early DELAY() using the i8254 not working
> to calibrate the TSC.  That might be just because DELAY() is interrupted.
> DELAY() never bothered to disable interrupts.  Its early use for calibrating
> the TSC depends on interrupts mostly not happening then.  (My version is
> a bit more careful, but it still doesn't disable interrupts.  It
> establishes error bounds provided interrupts are shorter than the i8254
> wrap period.)  If the i8254 is virtual, then even disabling interrupts
> on the target wouldn't help, since the disabling would only be virtual.

Yes, the DELAY() calibration is something I wanted to ask about.
Could you, please, take a look at
https://reviews.freebsd.org/D5738
there is a code which would benefit from better (re-)calibration.

Below is the patch to implement calibration of the ipi_wait() busy loop.
On my sandybridge 3.4Ghz, I get the message
LAPIC: ipi_wait() us multiplier 37 (r 128652089678 tsc 3392383992)

diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c
index 7e5087b..0842de5 100644
--- a/sys/x86/x86/local_apic.c
+++ b/sys/x86/x86/local_apic.c
@@ -56,6 +56,7 @@ __FBSDID("$FreeBSD$");
 #include <vm/pmap.h>
 
 #include <x86/apicreg.h>
+#include <machine/clock.h>
 #include <machine/cpufunc.h>
 #include <machine/cputypes.h>
 #include <machine/frame.h>
@@ -162,6 +163,7 @@ int x2apic_mode;
 int lapic_eoi_suppression;
 static u_long lapic_timer_divisor;
 static struct eventtimer lapic_et;
+static uint64_t lapic_ipi_wait_mult;
 
 SYSCTL_NODE(_hw, OID_AUTO, apic, CTLFLAG_RD, 0, "APIC options");
 SYSCTL_INT(_hw_apic, OID_AUTO, x2apic_mode, CTLFLAG_RD, &x2apic_mode, 0, "");
@@ -391,6 +393,7 @@ lvt_mode(struct lapic *la, u_int pin, uint32_t value)
 static void
 native_lapic_init(vm_paddr_t addr)
 {
+	uint64_t r;
 	uint32_t ver;
 	u_int regs[4];
 	int i, arat;
@@ -484,6 +487,34 @@ native_lapic_init(vm_paddr_t addr)
 		TUNABLE_INT_FETCH("hw.lapic_eoi_suppression",
 		    &lapic_eoi_suppression);
 	}
+
+#define LOOPS 1000000
+	/*
+	 * Calibrate the busy loop waiting for IPI ack in xAPIC mode.
+	 * lapic_ipi_wait_mult contains the number of iterations which
+	 * approximately delay execution for 1 microsecond (the
+	 * argument to native_lapic_ipi_wait() is in microseconds).
+	 *
+	 * We assume that TSC is present and already measured.
+	 * Possible TSC frequency jumps are irrelevant to the
+	 * calibration loop below, the CPU clock management code is
+	 * not yet started, and we do not enter sleep states.
+	 */
+	KASSERT((cpu_feature & CPUID_TSC) != 0 && tsc_freq != 0,
+	    ("TSC not initialized"));
+	r = rdtsc();
+	for (r = 0; r < LOOPS; r++) {
+		(void)lapic_read_icr_lo();
+		ia32_pause();
+	}
+	r = rdtsc() - r;
+	lapic_ipi_wait_mult = (r * 1000000) / tsc_freq / LOOPS;
+	if (bootverbose) {
+		printf("LAPIC: ipi_wait() us multiplier %jd (r %jd tsc %jd)\n",
+		    (uintmax_t)lapic_ipi_wait_mult, (uintmax_t)r,
+		    (uintmax_t)tsc_freq);
+	}
+#undef LOOPS
 }
 
 /*
@@ -1621,31 +1652,26 @@ SYSINIT(apic_setup_io, SI_SUB_INTR, SI_ORDER_THIRD, apic_setup_io, NULL);
  * private to the MD code.  The public interface for the rest of the
  * kernel is defined in mp_machdep.c.
  */
+
+/*
+ * Wait delay microseconds for IPI to be sent.  If delay is -1, we
+ * wait forever.
+ */
 static int
 native_lapic_ipi_wait(int delay)
 {
-	int x;
+	uint64_t i, counter;
 
 	/* LAPIC_ICR.APIC_DELSTAT_MASK is undefined in x2APIC mode */
-	if (x2apic_mode)
+	if (x2apic_mode || delay == -1)
 		return (1);
 
-	/*
-	 * Wait delay microseconds for IPI to be sent.  If delay is
-	 * -1, we wait forever.
-	 */
-	if (delay == -1) {
-		while ((lapic_read_icr_lo() & APIC_DELSTAT_MASK) !=
-		    APIC_DELSTAT_IDLE)
-			ia32_pause();
-		return (1);
-	}
-
-	for (x = 0; x < delay; x++) {
+	counter = lapic_ipi_wait_mult * delay;
+	for (i = 0; i < counter; i++) {
 		if ((lapic_read_icr_lo() & APIC_DELSTAT_MASK) ==
 		    APIC_DELSTAT_IDLE)
 			return (1);
-		DELAY(1);
+		ia32_pause();
 	}
 	return (0);
 }



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160325084902.GH1741>