FreeBSD Mail Archives

Date:      Fri, 13 May 2011 12:57:31 -0400
From:      Jung-uk Kim <jkim@FreeBSD.org>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r221703 - in head/sys: amd64/include i386/include x86/isa x86/x86
Message-ID:  <201105131257.34009.jkim@FreeBSD.org>
In-Reply-To: <4DCD2875.9040808@FreeBSD.org>
References:  <201105091734.p49HY0P3006180@svn.freebsd.org> <201105121239.31340.jkim@FreeBSD.org> <4DCD2875.9040808@FreeBSD.org>

On Friday 13 May 2011 08:47 am, Andriy Gapon wrote:
> on 12/05/2011 19:39 Jung-uk Kim said the following:
> > On Thursday 12 May 2011 07:23 am, Andriy Gapon wrote:
> >> on 09/05/2011 20:34 Jung-uk Kim said the following:
> >>> Author: jkim
> >>> Date: Mon May  9 17:34:00 2011
> >>> New Revision: 221703
> >>> URL: http://svn.freebsd.org/changeset/base/221703
> >>
> >> [snip]
> >>
> >> I would to note [again] that I don't like code style of this
> >> change.
> >>
> >>> Modified: head/sys/x86/x86/tsc.c
> >>> ===============================================================
> >>>== ============= --- head/sys/x86/x86/tsc.c	Mon May  9 17:30:25
> >>> 2011	(r221702) +++ head/sys/x86/x86/tsc.c	Mon May  9 17:34:00
> >>> 2011	(r221703) @@ -326,7 +326,73 @@ init_TSC(void)
> >>>  	    tsc_levels_changed, NULL, EVENTHANDLER_PRI_ANY);
> >>>  }
> >>>
> >>> -void
> >>> +#ifdef SMP
> >>> +
> >>> +#define	TSC_READ(x)			\
> >>> +static void				\
> >>> +tsc_read_##x(void *arg)			\
> >>> +{					\
> >>> +	uint32_t *tsc = arg;		\
> >>> +	u_int cpu = PCPU_GET(cpuid);	\
> >>> +					\
> >>> +	tsc[cpu * 3 + x] = rdtsc32();	\
> >>> +}
> >>> +TSC_READ(0)
> >>> +TSC_READ(1)
> >>> +TSC_READ(2)
> >>> +#undef TSC_READ
> >>> +
> >>> +#define	N	1000
> >>
> >> I don't like macro overuse in the above.
> >> Especially "N".
> >
> > I think duplicating trivial and almost identical functions is
> > more harmful to many eyes.
> >
> > 'N' here is temporary and arbitrary.  I think 1,000 is actually
> > overkill but I wanted to "stress-test" smp_rendezvous(). :-P
>
> I guess I need to try to write this in "my way" and see if I would
> be able to come up with anything more appealing :-)

:-)

> > Now that you found a bug in the function, I can remove N but I
> > want to keep it for a while.  Actually, it was already proven
> > useful, i.e., I gave a user the following patch:
> >
> > -#define	N	1000
> > +#define	N	100
>
> Well, I think this could be a static const variable or a tunable,
> but not a macro.

It will go away as soon as your patch for smp_rendezvous() is 
committed.

> > and it worked around his problem for now (until you commit the
> > smp_rendezvous() fix).
>
> I need more testers/reviewers for this.

At least one user reported to me that your patch works fine. :-)

> >>> +static void
> >>> +comp_smp_tsc(void *arg)
> >>> +{
> >>> +	uint32_t *tsc;
> >>> +	int32_t d1, d2;
> >>> +	u_int cpu = PCPU_GET(cpuid);
> >>> +	u_int i, j, size;
> >>> +
> >>> +	size = (mp_maxid + 1) * 3;
> >>> +	for (i = 0, tsc = arg; i < N; i++, tsc += size)
> >>> +		CPU_FOREACH(j) {
> >>> +			if (j == cpu)
> >>> +				continue;
> >>> +			d1 = tsc[cpu * 3 + 1] - tsc[j * 3];
> >>> +			d2 = tsc[cpu * 3 + 2] - tsc[j * 3 + 1];
> >>> +			if (d1 <= 0 || d2 <= 0) {
> >>> +				smp_tsc = 0;
> >>> +				return;
> >>> +			}
> >>> +		}
> >>> +}
> >>> +
> >>> +static int
> >>> +test_smp_tsc(void)
> >>> +{
> >>> +	uint32_t *data, *tsc;
> >>> +	u_int i, size;
> >>> +
> >>> +	if (!smp_tsc && !tsc_is_invariant)
> >>> +		return (-100);
> >>> +	size = (mp_maxid + 1) * 3;
> >>> +	data = malloc(sizeof(*data) * size * N, M_TEMP, M_WAITOK);
> >>> +	for (i = 0, tsc = data; i < N; i++, tsc += size)
> >>> +		smp_rendezvous(tsc_read_0, tsc_read_1, tsc_read_2, tsc);
> >>
> >> I don't like that what is logically a two dimensional array 3 x
> >> (mp_maxid + 1) is represented as a one-dimensional array with
> >> all ensuing multiplications by three and other array index
> >> manipulations.
> >
> > (As I said earlier) I agree with you in general but using
> > two-dimensional array or array of pointers just added more
> > complexity to the code and it didn't make the code any clearer
> > than I originally thought.  Again, it is just a matter of taste,
> > anyway. ;-)
>
> I guess this is the same as above, until I'll try to actually do
> something I won't be sure whether this could be done any better
> (for my taste) :-)

Please feel free (and let me know).

> >>> +	smp_tsc = 1;	/* XXX */
> >>> +	smp_rendezvous(smp_no_rendevous_barrier, comp_smp_tsc,
> >>> +	    smp_no_rendevous_barrier, data);
> >>> +	free(data, M_TEMP);
> >>> +	if (bootverbose)
> >>> +		printf("SMP: %sed TSC synchronization test\n",
> >>> +		    smp_tsc ? "pass" : "fail");
> >>> +	return (smp_tsc ? 800 : -100);
> >>
> >> I still think something higher should be returned here for the
> >> smp_tsc == true case.  It doesn't make sense to go through the
> >> shenanigans to underrate TSC in the end.
> >
> > http://docs.freebsd.org/cgi/mid.cgi?201105091352.49971.jkim
> >
> > I thought you wanted a separate commit, didn't you?
>
> I wanted HPET and ACPI quality to be changed in a separate commit,
> but I wanted TSC quality (for good TSC) to be bumped in this
> commit.

Oh, I see.  However, if we are going to adjust timecounter qualities, 
I think we should do all at the same time to reduce user confusions.

> >> On a more general note, smp_rendezvous() might not be the best
> >> mechanism here. In my opinion/understanding, smp_rendezvous()
> >> provides only the following guarantees:
> >> - if a setup action is specified, then every CPU executes the
> >> setup action before any CPU executes the main action
> >> - if a teardown action is specified, then every CPU executes the
> >> main action before any CPU executes the teardown action
> >>
> >> There are no timing guarantees, only the sequence guarantees.
> >
> > That's all I wanted to test, i.e., we only validate whether all
> > TSCs are in order (and that's all we can do, in fact).
>
> Well, not sure about that.
> E.g. the OpenSolaris code and the code that I derived from it
> compare difference between measured TSC value with how long it
> takes for a memory write by one CPU to be noticed by other CPU. 
> Not without some heuristic and assumptions, of course.

You guys keep saying some code derived from OpenSolaris exists to 
sync. TSCs.  Can you let me see it, pretty please?  I don't know what 
else to say. :-(

> >> Any timing observations that we may have now are a product of
> >> the implementation and can change if the implementation change. 
> >> So the newly introduced check may miss systemic differences in
> >> TSC values between CPUs if they are small enough. But I am not
> >> really sure if such a small differences would really matter.
> >>
> >> Worse case if there is some difference in TSC frequency between
> >> CPUs (e.g. in different sockets), after a powerup or a reset
> >> difference between TSC values may be very small, but could grow
> >> more and more with uptime.  Not sure if this is a realistic
> >> enough scenario, though.
> >
> > Actually, I am kinda reluctant to enable smp_tsc by default on
> > recent CPUs.  Although they made all TSCs in sync, it is very
> > very tricky to make it work in reality, e.g.,
> >
> > https://patchwork.kernel.org/patch/691712/
>
> I am not sure what is their concern there.
> TSC is good to be used as timecounter.

*Iff* they are all in sync. and atomically increasing...

> If they use TSC for performance measurements, then of course they
> have to use some barriers - this is well known and documented.

If my understanding is correct, Linux has to make sure the new timer 
value read from a CPU must be written/read to/from memory in order 
and all other CPUs must be able see the updated value as their 
"vsyscall" and/or "vDSO" version of gettimeofday(2) and friends rely 
on it.  Also, the last value read from a CPU is kept in memory and 
compared with a new value (possibly read from another CPU) to make 
sure it is incremental.  I'd call it a "TSC-safe" timecounter. ;-)
Some price to pay when you do timekeeping in user space to avoid 
syscalls...

> BTW, newer CPUs provide RDTSCP instruction which could be more
> convenient there.

AFAIK, some people wanted to do that but Linus thought RDTSCP is too 
expensive as it is a serialized instruction.

Jung-uk Kim

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201105131257.34009.jkim>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation