From owner-freebsd-arch@FreeBSD.ORG  Tue Nov 14 07:43:06 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 09AAF16A40F
	for <arch@freebsd.org>; Tue, 14 Nov 2006 07:43:06 +0000 (UTC)
	(envelope-from rizzo@icir.org)
Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A720143D45
	for <arch@freebsd.org>; Tue, 14 Nov 2006 07:43:05 +0000 (GMT)
	(envelope-from rizzo@icir.org)
Received: from xorpc.icir.org (localhost [127.0.0.1])
	by xorpc.icir.org (8.12.11/8.13.6) with ESMTP id kAE7h51j034526;
	Mon, 13 Nov 2006 23:43:05 -0800 (PST)
	(envelope-from rizzo@xorpc.icir.org)
Received: (from rizzo@localhost)
	by xorpc.icir.org (8.12.11/8.12.3/Submit) id kAE7h5hF034525;
	Mon, 13 Nov 2006 23:43:05 -0800 (PST) (envelope-from rizzo)
Date: Mon, 13 Nov 2006 23:43:05 -0800
From: Luigi Rizzo <rizzo@icir.org>
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
Message-ID: <20061113234305.A34147@xorpc.icir.org>
References: <20061113133236.A28926@xorpc.icir.org>
	<7327.1163453901@critter.freebsd.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5.1i
In-Reply-To: <7327.1163453901@critter.freebsd.dk>;
	from phk@phk.freebsd.dk on Mon, Nov 13, 2006 at 09:38:21PM +0000
Cc: arch@freebsd.org
Subject: Re: a proposed callout API
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Nov 2006 07:43:06 -0000

On Mon, Nov 13, 2006 at 09:38:21PM +0000, Poul-Henning Kamp wrote:
> In message <20061113133236.A28926@xorpc.icir.org>, Luigi Rizzo writes:
> >On Mon, Nov 13, 2006 at 08:53:41PM +0000, Poul-Henning Kamp wrote:
> >> 
> 
> >i am a bit curious on why  you want to split the callouts among
> >multiple data structures.
> 
> A binary heap is optimal for the timeouts that will happen, but
> filling it up with timeouts that are unlikely to, and in most
> cases won't happen for a very long time will soak up CPU
> time used for pointlessly ordering the heap.

that's only one side - you are still paying, on each entry,
the cost of the periodic scanning of the list, and the cpu burstiness
of these scanning operations is harmful for system with
pseudo-real-time requirements (and the workarounds complicate
operations).

To make a proper evaluation i would need some idea of the number
and distribution of scheduled events on a busy box, and of the
frequency of insertion/removals, which i don't know.
But just to make an example, say you have a total of 1000
insertions/deletions per second, 64k total events.

Using a single heap, that's 16k operations per second (log64k=16
is the cost of each insert/remove).
Using a small 1k heap (log1k=10 is the cost of each operation),
scanning the list once per second gives you 64k operations per second.
plus add the heap manipulation (but maybe that's in the noise, if,
say, only 10% of the insert-remove go there).

Sure if you make sure the short-term list has very few entries
the scanning costs go down, but how much really depends on your stats.

Note, i am not opposed to separating the heaps, i fully buy your
arguments on reducing the rescheduling costs and the representation
of the intervals on a smaller number of bits.  However, i would try
to use a more efficient data structure for the long-term entries,
eg. another heap with coarser (say 1s) granularity.
You can insert fake events for the first 100..1000 seconds
and have a small array of pointers for access pointers to
those entries, so inserting/rescheduling an event within the
next 100..1000 seconds window is O(1), and when the next event
fires (basically in the next second) you just have to move
a sublist to the main heap, without having to touch all the
other entries.

cheers
luigi