From owner-cvs-all@FreeBSD.ORG Sat Oct 22 10:17:26 2005 Return-Path: X-Original-To: cvs-all@freebsd.org Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C144616A41F; Sat, 22 Oct 2005 10:17:26 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4D35243D6B; Sat, 22 Oct 2005 10:17:21 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86]) by mailout1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j9MAHKbd012534; Sat, 22 Oct 2005 20:17:20 +1000 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j9MAHGv0015850; Sat, 22 Oct 2005 20:17:17 +1000 Date: Sat, 22 Oct 2005 20:17:16 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Poul-Henning Kamp In-Reply-To: <31753.1129924404@critter.freebsd.dk> Message-ID: <20051022193119.R8350@delplex.bde.org> References: <31753.1129924404@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: src-committers@freebsd.org, Andre Oppermann , cvs-src@freebsd.org, cvs-all@freebsd.org, Marcel Moolenaar , Andre Oppermann Subject: Re: Timekeeping [Was: Re: cvs commit: src/usr.bin/vmstat vmstat.c src/usr.bin/w w.c] X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Oct 2005 10:17:27 -0000 On Fri, 21 Oct 2005, Poul-Henning Kamp wrote: > In message <01DFB595-5279-4D3A-BEDA-5F0285E9519B@xcllnt.net>, Marcel Moolenaar > writes: > >>> I think we need the definition to consider if (process- ?)state is >>> retained while the system is unconcious or not. >> >> I'm not sure. I think that might be what makes the definition >> complex. > > Actually I don't think it does, it simplifies it. I agree. Except for statistics progams, it is necessary to keep as much history as practical; in particular, don't forgot the original boot time, and keep supporting averages since boot in vmstat and systat. > If a process survives across the "unconcious" period, then it follows > that CLOCK_MONOTONIC cannot be reset to zero in relation to the > unconcious period. What is survival? Everything might be restarted virtually. > But we are only just scratching the surface here, there are tons of > ambiguities we need to resolve, for instance: > > select(...., {3m0s}) > suspend > [ 2 minutes pass ] > resume > > When does select time out ? > > One minute after the resume ? > > Three minutes after the resume ? > > Right after the resume with a special errno ? As close as possible to 3m0s after select() was called. There are many longstanding bugs in this area. I remember the following: - the stillborn non-option APM_FIXUP_CALLTODO attempts to fix some of them, by reducing all timeouts by the suspend time. (It was stillborn because it is for the pre-callwheel implementation of timeouts but was committed after callwheel timeouts, so it never compiled in any committed version. The uselessness of APM_FIXUP_CALLTODO was hidden by not making it a normal option.) The problem of wrong timeouts after suspend is very old. Not fixing it avoids thundering herds of timeout expiries after suspend. - nanosleep(), select() and poll() use getnanouptime(), getmicrouptime() and getmicrouptime() to not-so-carefully check that the timeout has expired after they wake up (the wakeup is sometimes early or late due to minor inaccuracies; when it is early, we detect that not-so-carefully and go back to sleep; when it is late, we can't recover so we should request the timeout to always be a little early so that we can be as close to on time as possible). These syscalls should use non-get*() versions and non-*uptime() versions so that they actually know if the timeout expired. Using *uptime() doesn't work because it doesn't count suspend time. Using non-*uptime() doesn't quite work either, since the system's best idea of the real time may jump backwards. A monotonic clock that jumps forwards by the suspend time is needed. - realitimexpire() has the same bug as nanosleep() and friends. The very name of this function shows that it should not be using *uptime(). According to setitimer(2), "ITIMER_REAL decrements in real time". Using get*() in it is more justified than in nanosleep() since it is lower level so its efficiency may be important. > Some code should obviously know about the suspend/resume event, > dhclient, wep, wpa, bgpd, sshd, just to mention a few Code like cron should get enough notification be having timeouts expires as soon as possible after resume (if they would have expired during the suspend interval if there was no suspend). Such code can then check the actual time on the correct clock like nanosleep() and friends to see if a critical time has been reached. Bruce