Date: Fri, 13 May 2011 12:57:31 -0400 From: Jung-uk Kim <jkim@FreeBSD.org> To: Andriy Gapon <avg@freebsd.org> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r221703 - in head/sys: amd64/include i386/include x86/isa x86/x86 Message-ID: <201105131257.34009.jkim@FreeBSD.org> In-Reply-To: <4DCD2875.9040808@FreeBSD.org> References: <201105091734.p49HY0P3006180@svn.freebsd.org> <201105121239.31340.jkim@FreeBSD.org> <4DCD2875.9040808@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday 13 May 2011 08:47 am, Andriy Gapon wrote: > on 12/05/2011 19:39 Jung-uk Kim said the following: > > On Thursday 12 May 2011 07:23 am, Andriy Gapon wrote: > >> on 09/05/2011 20:34 Jung-uk Kim said the following: > >>> Author: jkim > >>> Date: Mon May 9 17:34:00 2011 > >>> New Revision: 221703 > >>> URL: http://svn.freebsd.org/changeset/base/221703 > >> > >> [snip] > >> > >> I would to note [again] that I don't like code style of this > >> change. > >> > >>> Modified: head/sys/x86/x86/tsc.c > >>> =============================================================== > >>>== ============= --- head/sys/x86/x86/tsc.c Mon May 9 17:30:25 > >>> 2011 (r221702) +++ head/sys/x86/x86/tsc.c Mon May 9 17:34:00 > >>> 2011 (r221703) @@ -326,7 +326,73 @@ init_TSC(void) > >>> tsc_levels_changed, NULL, EVENTHANDLER_PRI_ANY); > >>> } > >>> > >>> -void > >>> +#ifdef SMP > >>> + > >>> +#define TSC_READ(x) \ > >>> +static void \ > >>> +tsc_read_##x(void *arg) \ > >>> +{ \ > >>> + uint32_t *tsc = arg; \ > >>> + u_int cpu = PCPU_GET(cpuid); \ > >>> + \ > >>> + tsc[cpu * 3 + x] = rdtsc32(); \ > >>> +} > >>> +TSC_READ(0) > >>> +TSC_READ(1) > >>> +TSC_READ(2) > >>> +#undef TSC_READ > >>> + > >>> +#define N 1000 > >> > >> I don't like macro overuse in the above. > >> Especially "N". > > > > I think duplicating trivial and almost identical functions is > > more harmful to many eyes. > > > > 'N' here is temporary and arbitrary. I think 1,000 is actually > > overkill but I wanted to "stress-test" smp_rendezvous(). :-P > > I guess I need to try to write this in "my way" and see if I would > be able to come up with anything more appealing :-) :-) > > Now that you found a bug in the function, I can remove N but I > > want to keep it for a while. Actually, it was already proven > > useful, i.e., I gave a user the following patch: > > > > -#define N 1000 > > +#define N 100 > > Well, I think this could be a static const variable or a tunable, > but not a macro. It will go away as soon as your patch for smp_rendezvous() is committed. > > and it worked around his problem for now (until you commit the > > smp_rendezvous() fix). > > I need more testers/reviewers for this. At least one user reported to me that your patch works fine. :-) > >>> +static void > >>> +comp_smp_tsc(void *arg) > >>> +{ > >>> + uint32_t *tsc; > >>> + int32_t d1, d2; > >>> + u_int cpu = PCPU_GET(cpuid); > >>> + u_int i, j, size; > >>> + > >>> + size = (mp_maxid + 1) * 3; > >>> + for (i = 0, tsc = arg; i < N; i++, tsc += size) > >>> + CPU_FOREACH(j) { > >>> + if (j == cpu) > >>> + continue; > >>> + d1 = tsc[cpu * 3 + 1] - tsc[j * 3]; > >>> + d2 = tsc[cpu * 3 + 2] - tsc[j * 3 + 1]; > >>> + if (d1 <= 0 || d2 <= 0) { > >>> + smp_tsc = 0; > >>> + return; > >>> + } > >>> + } > >>> +} > >>> + > >>> +static int > >>> +test_smp_tsc(void) > >>> +{ > >>> + uint32_t *data, *tsc; > >>> + u_int i, size; > >>> + > >>> + if (!smp_tsc && !tsc_is_invariant) > >>> + return (-100); > >>> + size = (mp_maxid + 1) * 3; > >>> + data = malloc(sizeof(*data) * size * N, M_TEMP, M_WAITOK); > >>> + for (i = 0, tsc = data; i < N; i++, tsc += size) > >>> + smp_rendezvous(tsc_read_0, tsc_read_1, tsc_read_2, tsc); > >> > >> I don't like that what is logically a two dimensional array 3 x > >> (mp_maxid + 1) is represented as a one-dimensional array with > >> all ensuing multiplications by three and other array index > >> manipulations. > > > > (As I said earlier) I agree with you in general but using > > two-dimensional array or array of pointers just added more > > complexity to the code and it didn't make the code any clearer > > than I originally thought. Again, it is just a matter of taste, > > anyway. ;-) > > I guess this is the same as above, until I'll try to actually do > something I won't be sure whether this could be done any better > (for my taste) :-) Please feel free (and let me know). > >>> + smp_tsc = 1; /* XXX */ > >>> + smp_rendezvous(smp_no_rendevous_barrier, comp_smp_tsc, > >>> + smp_no_rendevous_barrier, data); > >>> + free(data, M_TEMP); > >>> + if (bootverbose) > >>> + printf("SMP: %sed TSC synchronization test\n", > >>> + smp_tsc ? "pass" : "fail"); > >>> + return (smp_tsc ? 800 : -100); > >> > >> I still think something higher should be returned here for the > >> smp_tsc == true case. It doesn't make sense to go through the > >> shenanigans to underrate TSC in the end. > > > > http://docs.freebsd.org/cgi/mid.cgi?201105091352.49971.jkim > > > > I thought you wanted a separate commit, didn't you? > > I wanted HPET and ACPI quality to be changed in a separate commit, > but I wanted TSC quality (for good TSC) to be bumped in this > commit. Oh, I see. However, if we are going to adjust timecounter qualities, I think we should do all at the same time to reduce user confusions. > >> On a more general note, smp_rendezvous() might not be the best > >> mechanism here. In my opinion/understanding, smp_rendezvous() > >> provides only the following guarantees: > >> - if a setup action is specified, then every CPU executes the > >> setup action before any CPU executes the main action > >> - if a teardown action is specified, then every CPU executes the > >> main action before any CPU executes the teardown action > >> > >> There are no timing guarantees, only the sequence guarantees. > > > > That's all I wanted to test, i.e., we only validate whether all > > TSCs are in order (and that's all we can do, in fact). > > Well, not sure about that. > E.g. the OpenSolaris code and the code that I derived from it > compare difference between measured TSC value with how long it > takes for a memory write by one CPU to be noticed by other CPU. > Not without some heuristic and assumptions, of course. You guys keep saying some code derived from OpenSolaris exists to sync. TSCs. Can you let me see it, pretty please? I don't know what else to say. :-( > >> Any timing observations that we may have now are a product of > >> the implementation and can change if the implementation change. > >> So the newly introduced check may miss systemic differences in > >> TSC values between CPUs if they are small enough. But I am not > >> really sure if such a small differences would really matter. > >> > >> Worse case if there is some difference in TSC frequency between > >> CPUs (e.g. in different sockets), after a powerup or a reset > >> difference between TSC values may be very small, but could grow > >> more and more with uptime. Not sure if this is a realistic > >> enough scenario, though. > > > > Actually, I am kinda reluctant to enable smp_tsc by default on > > recent CPUs. Although they made all TSCs in sync, it is very > > very tricky to make it work in reality, e.g., > > > > https://patchwork.kernel.org/patch/691712/ > > I am not sure what is their concern there. > TSC is good to be used as timecounter. *Iff* they are all in sync. and atomically increasing... > If they use TSC for performance measurements, then of course they > have to use some barriers - this is well known and documented. If my understanding is correct, Linux has to make sure the new timer value read from a CPU must be written/read to/from memory in order and all other CPUs must be able see the updated value as their "vsyscall" and/or "vDSO" version of gettimeofday(2) and friends rely on it. Also, the last value read from a CPU is kept in memory and compared with a new value (possibly read from another CPU) to make sure it is incremental. I'd call it a "TSC-safe" timecounter. ;-) Some price to pay when you do timekeeping in user space to avoid syscalls... > BTW, newer CPUs provide RDTSCP instruction which could be more > convenient there. AFAIK, some people wanted to do that but Linus thought RDTSCP is too expensive as it is a serialized instruction. Jung-uk Kim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201105131257.34009.jkim>