From owner-freebsd-emulation@FreeBSD.ORG Mon Apr 27 09:12:01 2009 Return-Path: Delivered-To: freebsd-emulation@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E73A1065678 for ; Mon, 27 Apr 2009 09:12:01 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by mx1.freebsd.org (Postfix) with ESMTP id EC8DA8FC14 for ; Mon, 27 Apr 2009 09:12:00 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-107-120-227.carlnfd1.nsw.optusnet.com.au (c122-107-120-227.carlnfd1.nsw.optusnet.com.au [122.107.120.227]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n3R9BoKC004577 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 27 Apr 2009 19:11:52 +1000 Date: Mon, 27 Apr 2009 19:11:50 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Juergen Lock In-Reply-To: <20090426184021.GA9545@triton.kn-bremen.de> Message-ID: <20090427182336.K64097@delplex.bde.org> References: <200904032223.n33MNTiq019599@triton.kn-bremen.de> <200904072137.n37LbbdC071227@triton.kn-bremen.de> <20090423214701.GA83621@triton.kn-bremen.de> <20090424201623.N887@besplex.bde.org> <20090426184021.GA9545@triton.kn-bremen.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: kalinoj1@iem.pw.edu.pl, freebsd-emulation@FreeBSD.org Subject: Re: Recent qemu and timers issue X-BeenThere: freebsd-emulation@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Development of Emulators of other operating systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Apr 2009 09:12:01 -0000 On Sun, 26 Apr 2009, Juergen Lock wrote: > On Fri, Apr 24, 2009 at 10:20:33PM +1000, Bruce Evans wrote: >> On Thu, 23 Apr 2009, Juergen Lock wrote: >> >>> On Tue, Apr 07, 2009 at 11:37:37PM +0200, Juergen Lock wrote: >>>> In article <200904062254.37824.kalinoj1@iem.pw.edu.pl> you write: >>>>> Dnia sobota 04 kwietnia 2009 o 00:23:29 Juergen Lock napisa=C5=82(a): >>>>>> In article you write: >>>>> ... >>>>>>> I tried to use all possible timers using sysctl, where I have: >>>>>>> TSC(800) HPET(900) ACPI-safe(850) i8254(0) dummy(-1000000) >>>>>>> None of these helped. >> >> None of these are normally used for calculating runtimes. Normally >> on i386, the TSC is used. > > Aaah-haa, this I didn't know. > >> The only way to configure this is to edit >> the source code. Try removing the calls to set_cputicker() in the MD >> code. Then the MI and timecounter-based cputicker tc_cpu_ticks() will >> be used. > > Yup, that seemed to help indeed. (patch below.) > >> A better implementation would use a user-selectable >> timecounter-based cputicker in all cases, but usually not the system >> timecounter since that is likely to be very slow so as to be more >> accurate. >> > This was using qemu's emulated hpet... I guess you mean slow to read > the counter value? How often is the cputicker read, at every context > switch? More often? Yes, ACPI timecounter hardware typically takes 1000 nsec to read, while TSC hardware typically takes 5 nsec to read (12 cycles on AthlonXP and Athlon64; more on P3-4, Core2 and Phenom). I don't know how long it takes to read a typical HPET. Emulated timecounter hardware is likely to be even slower. Timecounter software typically adds only another 20 (50?) nsec. The cputicker is read mainly at every context switch. >> [...some fixes] >> > ...and I tried this, both changes didn't fix the problem. > >> Another thing you can try here is to edit the source code to change >> the set_cputicker() calls to say that the frequency is not variable. > > That probably won't help here because I noticed at least the initial > tsc `calibration' in the guest (in init_TSC()) is way off too (it got > not even half the value here of the actual frequency, which according > to dmesg on this host is `TSC: P-state invariant'.) The initial calibration code is even sloppier than the recalibration, and is more likely not to work under emulation. It depends on the i8254 timer being accurate and doesn't try to sandwich reads of the TSC between close-together reads of the reference timer or otherwise try to limit errors in reading the reference timer. With real hardware this normally causes an avoidable error of at most 5 ppm (from waiting 5 i8254 cycles extra), but with emulated hardware it probably causes a larger error even if the emulation is perfect. The recalibration does better by using a higher quality reference timer sampled over an interval 16 times as long. This should be fixable using the machdep.tsc_freq sysctl. However, this sysctl neglects to call set_cputicker(). This should make little difference when the frequency is nominated as variable since recalibration should change it soon anyway. However, the bug in recalibration prevents downwards adjustments. > OK _maybe_ if we get the proper frequency into the guest there somehow > from the beginning and then say its not variable maybe it could work, > but that still leaves the case of hosts with non P-state invariant tsc > because... > >> I used this temporarily to work around the non-decreasing calibration. >> This should be the default for emulators for most cputickers -- emulators >> should emulate a constant frequency and not emulate the complexities >> for pwoer saving. > > Hmm I guess thats more easily said than done. :) At least qemu > basically just passes the host tsc thru when a guest reads it. But it claims P-state invariance? Maybe it gets that from the host. Does it trap TSC reads? This would be slow, but required to emulate P-state invariance and might be required for accurate timing anyway. I think emulators shouldn't trap reads of the TSC because the TSC is unreliable for accurate timing anyway, but they should do something to keep slower-to-access accurate hardware timers virtually accurate. Hopefully the hardware people will eventually make a timer like the TSC both accurate and fast. Emulators will have a difficult time preserving both. Bruce