From owner-freebsd-amd64@FreeBSD.ORG Wed Dec 3 18:33:34 2008 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E8BE106564A; Wed, 3 Dec 2008 18:33:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx09.syd.optusnet.com.au (fallbackmx09.syd.optusnet.com.au [211.29.132.242]) by mx1.freebsd.org (Postfix) with ESMTP id 5E6828FC08; Wed, 3 Dec 2008 18:33:33 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id mB3BEfdV023920; Wed, 3 Dec 2008 22:14:41 +1100 Received: from c220-239-225-17.carlnfd1.nsw.optusnet.com.au (c220-239-225-17.carlnfd1.nsw.optusnet.com.au [220.239.225.17]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id mB3BEQtY002065 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 3 Dec 2008 22:14:36 +1100 Date: Wed, 3 Dec 2008 22:14:27 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Alexander Motin In-Reply-To: <49358BED.3030903@FreeBSD.org> Message-ID: <20081203210228.R1989@delplex.bde.org> References: <1224616985.00027652.1224606603@10.7.7.3> <1224728582.00028075.1224715806@10.7.7.3> <4932F34C.1040804@FreeBSD.org> <200812021243.08513.jkim@FreeBSD.org> <49358684.7010508@FreeBSD.org> <49358A3F.7020701@root.org> <49358BED.3030903@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-acpi@freebsd.org, freebsd-amd64@freebsd.org, peter@freebsd.org, Nate Lawson Subject: Re: Semi-working patch for amd64 suspend/resume X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Dec 2008 18:33:34 -0000 On Tue, 2 Dec 2008, Alexander Motin wrote: > Nate Lawson wrote: >>> The only strange effect I have noticed was incorrect CPU time some >>> processes got: >>> %ps ax >>> PID TT STAT TIME COMMAND >>> 12 ?? WL 280503:38,05 [intr] >>> 1430 ?? Ss 280503:38,34 icewm >>> >>> But I think it is more timer driver related then resume itself. >> >> If you are using the LAPIC timer (default), it won't be running properly >> during resume. However, this wide discrepancy seems to indicate that >> the timer state is not being resumed properly. What if you use the ACPI >> timer (hw.timecounter.* I think are the sysctls)? > > As I understand, I am now using LAPIC timer for HZ generation, ACPI-fast as > time source and TSC as kernel DELAY() source. CPU times use mainly the cpu_ticker (TSC on i386), and the cpu_ticker code has always been broken if the frequency changes a lot. The main bugs that I know about are: (1) the cpu time is (total cpu_ticks) / cputick_frequency(now) but should be the integral over previous thread history of (delta cpu_ticks) / cputick_frequency(t) dt. The former gives a wrong value if cputick_frequency(t) is not constant over previous thread history, and the wrongness is very obvious for long-running threads like intr and idle ones if the TSC frequency changes significantly (e.g., by cpufreq). cpufreq has a callback to reinitialize the frequency calibration, but this doesn't help much. I can't find any resume method for the TSC. (2) frequency _re_calibration is broken. It never decreases the frequency. Thus if the frequency is transiently high, the transiently high calibration persists until the next reinitialization of the frequency calibration (or until a tranisiently higher frequency is seen). Small variations due to temperature changes thus make the frequency persistently slightly higher that it should be, and large variations due to stopping a timer or stopping or throttling the TSC can make the freqency persistently very wrong. This wrongness is very obvious using ddb. While in ddb, interrupt timers are stopped but the TSC advances. Recalibration then gives an enormously high frequency (nearly 1/0 = infinity) that is sticky due to the bug. Dividing by this then gives all cpu times of nearly 0, modulo monotonicity enforcement by calcru() (which helps here -- old nonzero times for intr and idle remain nonzero). The sanity checking in the recalibration detects remarkably few cases of insanity for some reason, perhaps because the timers are too in sync. You seem to have the opposite problem, that times are enormously high. This would be caused by the frequency being calibrated as nearly 0, but I can't see how this could happen -- the TSC is presumably stopped while the system is suspended, so the recalibration code would tend to give a frequency far too low if the resume method is indeed missing, but bug (2) prevents this low value being used; OTOH, the resume method should recalibrate only after restarting all clocks, so it shouldn't suffer from bug (2). There are possible races getting the calibration done by the resume method before the main timer interrupt handler does it based on bogus data, but the latter doesn't happen on every timer interrupt so you would be unlucky to lose these races. If the frequency is transiently miscalibrated as nearly 0 and you look at the cpu time using calcru() during this time, then bug (1) gives enormously high times like the above (nearly 1/0) for long running processes; then calcru()'s monotonicity enforcement preserves the enormously high times almost forever (recalibration should eventually fix the frequency, so bug (1) would give normal times again since nothing much has happened to the tick counts; however monotonicity enforcement results in the transiently high times being returned almost forever -- the returned times won't even increase until the normal times reach the enormous value or another transient miscalibration messes up the calculation of the raw times again). Bruce