From owner-freebsd-arch@FreeBSD.ORG Tue Nov 28 23:28:46 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 471D216A403; Tue, 28 Nov 2006 23:28:46 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.FreeBSD.org (Postfix) with ESMTP id 37FF343CAA; Tue, 28 Nov 2006 23:28:39 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.48.2]) by phk.freebsd.dk (Postfix) with ESMTP id A260A170C5; Tue, 28 Nov 2006 23:28:43 +0000 (UTC) To: John Baldwin From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 28 Nov 2006 16:31:18 EST." <200611281631.19224.jhb@freebsd.org> Date: Tue, 28 Nov 2006 23:28:41 +0000 Message-ID: <6194.1164756521@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: freebsd-arch@freebsd.org Subject: Re: a proposed callout API X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Nov 2006 23:28:46 -0000 In message <200611281631.19224.jhb@freebsd.org>, John Baldwin writes: John, I would very much welcome your participation on this. On the absolute vs relative time thing, this gets far nastier once you start to think about it. As far as I know, nothing in the kernel asks for sleeps until a given wall-clock (UTC) time. Userland on the other hand often does, and almost never should, but lets leave that behind for a moment. [1] Suspend/resume is a tricky complication here. Some sleeps and callouts want to sleep on the "while the CPU is concious" timescale, for instance for pushing dirty pages to disk or collecting usage statistics. Others want to sleep on the absolute (TAI) timescale, such as TCP retransmission and keepalive timeouts. (The indicative internal/external distinction is not safe btw.) Right now we don't distinguish between the two cases, and my intention was to leave this for a later stage where we could add flag-bits to signal these desires, once an survey of the kernel code had revealed which were the sensible default. We can of course add the flags as no-ops already now where this is immediately obvious to us. >Part of the idea was to fix >places that abused tsleep(..., 1), etc. to figure out a "real" sleep >interval. This is going to be the major pain in the transition, no matter what we do. Pretty much all short sleep and callout durations are bogus because of the traditional rounding(-up) and HZ granularity. >Also, my other API change I was going to do was something like this: > >msleep() -> mtx_sleep() >msleep_spin() -> sl_sleep() [...] >rw_sleep(), sx_sleep() [...] I think this sounds eminently sensible, even if we initially do just the crude thing, getting it expressed in the API allows us to improve the implementation later on. Poul-Henning [1] OK, couldn't resist: Much of this trouble comes about because it used to be that only the UTC clock were available, and programs havn't been rewritten to use CLOCK_MONOTONIC where they should. Examples of bogus behaviour: Named(8) wants to time zones out on the TAI scale not the UTC scale, so it should not be affected by NTPD stepping the clock but only the uptime of the system. Any amount of time the system is suspended should be tolled on the timer. Xlock suffers from the same and gets terribly upset when NTPD steps the clock. Various reminder tools, want to sleep until a given UTC time, but end up sleeping the relative time we estimate until that time when they go to sleep. If NTPD steps the clock while they sleep, they do not find out and the reminder gets fired at the wrong time. (Hint: Don't entrust calendar(8) with remembering you marriage aniversary). NTPD on the other hand, needs to know about suspend/resume so it can DTRT to the clock and doesn't get told so it totally makes a mess of things. One conclusion I've reached is that the kernel should issue a SIGTIMEWARP to all processes whenever there is a UTC clock discontinuity. It's been suggested that devd(8) should do this but I think it is a kernel task. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.