Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Jun 2007 14:27:34 -0700 (PDT)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Dmitry Morozovsky <marck@rinet.ru>
Cc:        freebsd-stable@freebsd.org, Ivan Voras <ivoras@fer.hr>
Subject:   Re: calcru: runtime went backwards, RELENG_6, SMP
Message-ID:  <200706062127.l56LRYTe090137@apollo.backplane.com>
References:  <20070606153542.Y76617@woozle.rinet.ru> <f46tmc$rgb$2@sea.gmane.org> <20070606231940.T91939@woozle.rinet.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
:IV> > Upd: on GENERIC/amd64 kernel I got the same errors.
:IV> 
:IV> Do you perhaps run with TSC timecounter? (that's the only cause I've notice
:IV> that can generate this message).
:
:Nope:
:
:marck@ct-new:~> sysctl kern.timecounter
:kern.timecounter.tick: 1
:kern.timecounter.choice: TSC(-100) ACPI-fast(1000) i8254(0) dummy(-1000000)
:kern.timecounter.hardware: ACPI-fast
:...

    kgdb your live kernel and 'print cpu_ticks'.  See what the cpu ticker
    is actually pointing at, because it might not be the time counter.
    It could still be TSC.

    The TSC isn't synchronized between the cores on a SMP box, not even
    on multi-core parts.  It can't be used to calculate delta times
    for any thread that has the possibility of migrating between cpu's.
    Not only will the absolute offset be off between cpus, but the frequency
    will also be slightly different (at least on SMP multi-core parts),
    so you get frequency drift too.

    There is also possibly an issue with tc_cpu_ticks(), which seems to
    be using a static 64 bit variable to handle rollover instead of
    a per-cpu variable.  I don't see how that could possibly be MP safe,
    especially if the timecount is not synchronized between cpus and
    causes multiple rollover events.

    In fact, I can *barely* use the TSC on DragonFly for KTR logging, and
    even then I have to have some kernel threads sitting there doing nothing
    but figuring out the drift between the cpus so it can correct the
    TSC values when it logs information... and even with all of that I
    can't get them synchronized any closer then around 500ns from each
    other.

    I'd recommend that FreeBSD do what we did years ago with calcru ... stop
    trying to calculate the time down to the nanosecond and just do it
    statistically.  It works just fine and takes the whole mess out of
    the critical path.

						-Matt




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200706062127.l56LRYTe090137>