From owner-freebsd-acpi@FreeBSD.ORG Mon Jul 30 08:55:57 2012 Return-Path: Delivered-To: freebsd-acpi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 53717106566B; Mon, 30 Jul 2012 08:55:57 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id BE2908FC08; Mon, 30 Jul 2012 08:55:56 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q6U8tkcJ007031 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 30 Jul 2012 18:55:47 +1000 Date: Mon, 30 Jul 2012 18:55:46 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Alexander Motin In-Reply-To: <501628D2.2090507@FreeBSD.org> Message-ID: <20120730171246.Y1715@besplex.bde.org> References: <5014DD00.3000307@FreeBSD.org> <20120729175031.U2084@besplex.bde.org> <50150CF5.4070605@FreeBSD.org> <20120729221526.H2941@besplex.bde.org> <50154C58.4060408@FreeBSD.org> <20120730141426.D1219@besplex.bde.org> <501628D2.2090507@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-acpi@FreeBSD.org Subject: Re: Using bintime() in acpi_cpu_idle()? X-BeenThere: freebsd-acpi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: ACPI and power management development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jul 2012 08:55:57 -0000 On Mon, 30 Jul 2012, Alexander Motin wrote: > On 30.07.2012 07:33, Bruce Evans wrote: >> On Sun, 29 Jul 2012, Alexander Motin wrote: >>> ... >>> Timecounter already has detection logic to disable TSC in cases where >>> it is unreliable. I don't want to replicate it here. I need not >>> precise and not synchronized by reliable and fast time source. >> >> Yes, this logic gives exactly what you don't want (an inefficient >> timecounter), by preventing use of the TSC for the timecounter, although >> the TSC is perfectly usable for the ticker and here. > > Can you teach me how to use ticker that is not ticking? If TSC was considered > unusable for timecounter for reasons unrelated to SMP, how can I use it as > ticker. No :-). I can't teach you how to use either the ticker or the timecounter if their clock is not ticking. I'm just saying that if you use can blindly use a timecounter, then you can blindly use the ticker. The working of both depends on their clock not stopping ticking, and that in many cases their clock is the same (the TSC). The TSC is considered usable for the ticker under weak conditions: - it exists according to CPUID_TSC - it is not disabled by the machdep.disable_tsc tuneable - its dynamic probe finds that its frequency is nonzero. The probe has some more cpuid tests and other complications which may prevent it being fuly dynamic. There is another tuneable, machdep.disable_tsc_calibration which prevents the dynamic frequency determination. I think the frequency comes from a table then, and is never zero, so this doesn't prevent the TSC being used for the ticker. - the 2 tuneables are of course undocumented in /usr/share/man. There is hardly any useful documentation of the TSC there either. zgrep finds "TSC" mainly in timercounters(4) and hwpmc(4). In timecounters(4), the references to the TSC are useless since they are just literal output of $(sysctl kern.timecounter). In hwpmc(4), the READTSC instruction but not much more is mentioned. The TSC is considered usable for a timecounter under the above conditions, but its default quality is low so it rarely gets used. Its quality is changed under the following conditions: - APM enabled: reduce quality to nearly -infinity - CPU can deep sleep, and Intel CPU, and TSC not invariant: reduce quality to nearly -infinity, because (only) Intel CPUs are known to stop the TSC in deep sleeps under these conditions. This is what you should have told me to justify use of binuptime() :-). Users can still configure the TSC as a timecounter, but this would break more than your use of binuptime() if the TSC actually stops. - SMP configured, and > 1 CPU: - vm guest: reduce quality significantly, but not to nearly -infinity - else do cpuid and dynamic synchronization tests: - fail tests: reduce as for vm guest - pass tests: increase a little, to just above ACPI-fast IIRC - pass synchronization tests, but not invariant: keep default. - SMP not configured, or only 1 CPU: increase a little iff invariant. Invariant means P-state invariant. I forgot that the invariance flag was a tuneable. This tuneable, kern.timecounter.tsc_invariant, is of course undocumented. It conditionalizes more than this case. Other bugs in it are: - it is in a different namespace than the tuneables described above. - this different namespace is worse, since the flag applies to more than the timecounter decision. It also gives the ticker invariance, flag and controls whether there are event handlers for frequency changes. - you can force the flag on using the tuneable, but you can't force it off. - for SMP, there is also the kern.timecounter.smp tunable. This has much the same bugs as kern.timecounter.tsc_invariant: - it is of course undocumented - you can force it on, but you can't force it off - however, its namespace seems to be not incorrect, since it seems to only control timecounter quality (very indirectly now, by modifying the dynamic probes. It used to be a simple flag to modify the SMP config option). Stopping of the TSC in deep sleeps doesn't prevent its use as the ticker. This should mostly work for the main use of the ticker, for thread runtimes, because most threads never idle directly, but switch to the idle thread for some CPU. I think deep sleeps break runtime accounting for idle threads (if the ticker stops). Has anyone seen this (idle times near 0 on mostly-idle systems that have spent days idling)? >>>> I wouldn't trust timecounters for some time after waking up after a >>>> deep sleep. If their clock stopped then the times read might only be >>>> ... >>> I am not sure what reinitialization are you talking about. IIRC, there >>> is no any waking up code for TSC. None other time counters have >>> problems with C-states. >> >> It is the timecounter code that needs reinitializing. If the TSC stops, >> or wraps mod 2**32, then its counts become garbage for the purpose of >> timecounting. Maybe it is not used for timecounting in either of these >> cases. But these cases shouldn't prevent its use for timecounting. > ... > At this moment I am not talking about S-states sleeping for hours. I am > talking about C-states for milliseconds. It means that TSC may stop and start > 10K times each second or even more. Attempt to save and restore its state > will consume so much resources, that probably make it useless. You should have told me the lengths of the sleeps early in this thread :-). I only know enough about this to ask annoying questions. > What's about wrap after 2 seconds, I would be happy to make CPU sleep for so > long, but now 100ms is all I can hope even on idle system. Covered by the above, but future-proofing requires supporting arbitrary sleep lengths. Use a less efficient timer that works over long sleeps iff the sleep was long. The problems are to determine whether the sleep was long, and to switch timers. >> At boot time there is a dummy timecounter that returns bogo-times. >> Apparently sleeping doesn't occur before the timecounter is switched to >> a real one. The dummy timecounter isn't switched back to after boot >> time. But it probably should be, since the hardware timecounter may >> have stopped or wrapped. Sleeping could just set a flag to indicate >> this state, but then you would have to provide a fake time anyway on >> finding the flag set. Boot time just points to the dummy timecounter >> so as not to check this flag in all early timecounter "hardware" calls. > > And how dummy timecounter that counts something, but not time, can help me to > measure sleep time? It helps negatively. You can't use a dummy timecounter any more than you can used a stopped or wrapped timecounter if you actually need to know the time. In low-level code it is unclear whether timecounters can be used. Where binuptime() can be used is of course undocumented. binuptime() actually has a man page, but it is just a stub. Timecounters actually can be used in most low-level code, partly because they need to work in fast interrupt handlers for PPS timestamps, but I wouldn't want to make their normal use slower to support this. Bruce