From owner-freebsd-arch@FreeBSD.ORG Sun Dec 2 11:58:06 2007 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4DB3116A41A; Sun, 2 Dec 2007 11:58:06 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id EAAF713C448; Sun, 2 Dec 2007 11:58:05 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id AF9E217105; Sun, 2 Dec 2007 11:58:04 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.1/8.14.1) with ESMTP id lB2Bw4L2018379; Sun, 2 Dec 2007 11:58:04 GMT (envelope-from phk@critter.freebsd.dk) To: Andre Oppermann From: "Poul-Henning Kamp" In-Reply-To: Your message of "Sun, 02 Dec 2007 12:39:54 +0100." <4752998A.9030007@freebsd.org> Date: Sun, 02 Dec 2007 11:58:04 +0000 Message-ID: <18378.1196596684@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: Attilio Rao , arch@freebsd.org, Robert Watson Subject: Re: New "timeout" api, to replace callout X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Dec 2007 11:58:06 -0000 In message <4752998A.9030007@freebsd.org>, Andre Oppermann writes: > o TCP maintains a number of concurrent, but hierarchical timers for > each session. What predominantly happens is a reschedule of an > existing timer, that means it wasn't close to firing and is moved > out again. This happens for every incoming segment. > > -> The timer facility should make it simple and efficient to move > the deadline into the future. That is more or less the reason for the multiple timescales (ns,us,ms,s) and the TIMEOUT_UNLIKELY flag. For long running timeouts that almost never happen, I don't want to even move them in a linked list when their time is moved further in the future. A timeout that is 60 seconds or more into the future can just be put on any random shelf until it gets a lot closer. The exact mechanism is TBD, but the intent is to not waste time on timeouts that are unlikely to happen, and to avoid them getting in the way of the timeouts that we do happen. > o TCP puts the timer into an allocated structure and upon close of the > session it has to be deallocated including stopping of all currently > running timers. > [...] > -> The timer facility should provide an atomic stop/remove call > that prevent any further callbacks upon return. It should not > do a 'drain' where the callback may be run anyway. > Note: We hold the lock the callback would have to obtain. It is my intent, that the implementation behind the new API will only ever grab the specified lock when it calls the timeout function. When you do a timeout_disable() or timeout_cleanup() you will be sleeping on a mutex internal to the implementation, if the timeout is currently executing. > o TCP has hot and cold CPU/cache affinity. > > -> The timer facility should provide strong, weak and "don't care" > CPU affinity. The affinity should be selected for a timer as > whole, not upon each call. That is the "timeout_p" you pass into timeout_init() is for. What values we will provide there is not decided, apart from NULL meaning "whatever..." > o TCP's data structure is exported to userspace and contains the > timeout data structures. This complicates timeout handling as > the data structure is not known to userland and we have to do > some hacks to prevent exposure. > > -> The timer facility should provide an opaque userland compat > header definition. I don't even want to expose its content to the client code, but I do want its size known at compile time. My current definition looks like: struct timeout { struct timeout_p *_prov; union { uintptr_t _timeout_private_i; void *_timeout_private_p; } _u[10]; }; (for some value of 10) I'm still playing with it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.