Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Apr 2009 19:11:50 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Juergen Lock <nox@jelal.kn-bremen.de>
Cc:        kalinoj1@iem.pw.edu.pl, freebsd-emulation@FreeBSD.org
Subject:   Re: Recent qemu and timers issue
Message-ID:  <20090427182336.K64097@delplex.bde.org>
In-Reply-To: <20090426184021.GA9545@triton.kn-bremen.de>
References:  <200904032223.n33MNTiq019599@triton.kn-bremen.de> <200904072137.n37LbbdC071227@triton.kn-bremen.de> <20090423214701.GA83621@triton.kn-bremen.de> <20090424201623.N887@besplex.bde.org> <20090426184021.GA9545@triton.kn-bremen.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 26 Apr 2009, Juergen Lock wrote:

> On Fri, Apr 24, 2009 at 10:20:33PM +1000, Bruce Evans wrote:
>> On Thu, 23 Apr 2009, Juergen Lock wrote:
>>
>>> On Tue, Apr 07, 2009 at 11:37:37PM +0200, Juergen Lock wrote:
>>>> In article <200904062254.37824.kalinoj1@iem.pw.edu.pl> you write:
>>>>> Dnia sobota 04 kwietnia 2009 o 00:23:29 Juergen Lock napisa=C5=82(a):
>>>>>> In article <c948bb4de85d1b2a340ac63a7c46f6d9@iem.pw.edu.pl> you write:
>>>>> ...
>>>>>>> I tried to use all possible timers using sysctl, where I have:
>>>>>>> TSC(800) HPET(900) ACPI-safe(850) i8254(0) dummy(-1000000)
>>>>>>> None of these helped.
>>
>> None of these are normally used for calculating runtimes.  Normally
>> on i386, the TSC is used.
>
> Aaah-haa, this I didn't know.
>
>>  The only way to configure this is to edit
>> the source code.  Try removing the calls to set_cputicker() in the MD
>> code.  Then the MI and timecounter-based cputicker tc_cpu_ticks() will
>> be used.
>
> Yup, that seemed to help indeed. (patch below.)
>
>>  A better implementation would use a user-selectable
>> timecounter-based cputicker in all cases, but usually not the system
>> timecounter since that is likely to be very slow so as to be more
>> accurate.
>>
> This was using qemu's emulated hpet...  I guess you mean slow to read
> the counter value?  How often is the cputicker read, at every context
> switch?  More often?

Yes, ACPI timecounter hardware typically takes 1000 nsec to read, while
TSC hardware typically takes 5 nsec to read (12 cycles on AthlonXP and
Athlon64; more on P3-4, Core2 and Phenom).  I don't know how long it
takes to read a typical HPET.  Emulated timecounter hardware is likely to
be even slower.  Timecounter software typically adds only another 20
(50?) nsec.  The cputicker is read mainly at every context switch.

>> [...some fixes]
>>
> ...and I tried this, both changes didn't fix the problem.
>
>> Another thing you can try here is to edit the source code to change
>> the set_cputicker() calls to say that the frequency is not variable.
>
> That probably won't help here because I noticed at least the initial
> tsc `calibration' in the guest (in init_TSC()) is way off too (it got
> not even half the value here of the actual frequency, which according
> to dmesg on this host is `TSC: P-state invariant'.)

The initial calibration code is even sloppier than the recalibration,
and is more likely not to work under emulation.  It depends on the
i8254 timer being accurate and doesn't try to sandwich reads of the
TSC between close-together reads of the reference timer or otherwise
try to limit errors in reading the reference timer.  With real hardware
this normally causes an avoidable error of at most 5 ppm (from waiting
5 i8254 cycles extra), but with emulated hardware it probably causes
a larger error even if the emulation is perfect.  The recalibration
does better by using a higher quality reference timer sampled over an
interval 16 times as long.

This should be fixable using the machdep.tsc_freq sysctl.  However,
this sysctl neglects to call set_cputicker().  This should make
little difference when the frequency is nominated as variable since
recalibration should change it soon anyway.  However, the bug in
recalibration prevents downwards adjustments.

> OK _maybe_ if we get the proper frequency into the guest there somehow
> from the beginning and then say its not variable maybe it could work,
> but that still leaves the case of hosts with non P-state invariant tsc
> because...
>
>> I used this temporarily to work around the non-decreasing calibration.
>> This should be the default for emulators for most cputickers -- emulators
>> should emulate a constant frequency and not emulate the complexities
>> for pwoer saving.
>
> Hmm I guess thats more easily said than done. :)  At least qemu
> basically just passes the host tsc thru when a guest reads it.

But it claims P-state invariance?  Maybe it gets that from the host.
Does it trap TSC reads?  This would be slow, but required to emulate
P-state invariance and might be required for accurate timing anyway.
I think emulators shouldn't trap reads of the TSC because the TSC
is unreliable for accurate timing anyway, but they should do something
to keep slower-to-access accurate hardware timers virtually accurate.
Hopefully the hardware people will eventually make a timer like the
TSC both accurate and fast.  Emulators will have a difficult time
preserving both.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090427182336.K64097>