From owner-freebsd-arch@FreeBSD.ORG  Fri Nov 19 09:38:00 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7F8DE16A4CE; Fri, 19 Nov 2004 09:38:00 +0000 (GMT)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id C50B943D1D; Fri, 19 Nov 2004 09:37:59 +0000 (GMT)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.13.1/8.13.1) with ESMTP id iAJ9bvaU076634;
	Fri, 19 Nov 2004 10:37:58 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: arch@freebsd.org, current@freebsd.org
From: Poul-Henning Kamp <phk@phk.freebsd.dk>
Date: Fri, 19 Nov 2004 10:37:57 +0100
Message-ID: <76633.1100857077@critter.freebsd.dk>
Sender: phk@critter.freebsd.dk
Subject: [REVIEW/TEST] nanodelay() vs DELAY()
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Nov 2004 09:38:00 -0000


A number of device drivers need to have predictable small delays and
today they largely rely on DELAY() for that.

Problem is, DELAY sucks for durations below 10-20 usec.

I have written a first cut of a self-calibrating nanodelay() which
is better three orders of magnitude further down.

The patch can be found at in p4::phk_dev or

	http://phk.freebsd.dk/patch/nanodelay.patch

nanodelay() takes a nanosecond argument and will try to sleep exactly
that many nanoseconds but never less than that.

With a primed cache, the performance is close to perfect from 200nsec
and up, and below that it depends a lot on the speed of the machine.

Without a primed cache, an unpredictable jitter will be superimposed
on the delay duration.

Here is a plot which shows how DELAY() and nanodelay() perform on two
of my test-machines:

	http://phk.freebsd.dk/misc/nanodelay.png

I would appreciate if driver writers could play with this and
see if makes any difference anywhere.

Please notice that this is a timed cpu-spin, it does not allow
another thread to use the CPU, so it should only be used for
short (< 1/hz) delays.

How it works:

A default routine spins on the timecounter using nanouptime().  How
well this works depends on which timecounter we use, but in general
we can trust it to be OK above a few microseconds.

An array contains function+arg to use for delays less than 8 usec,
for longer delays the timecounter routine is always called.

Each bucket in the array spans 8 nanoseconds, so delays of
0-7 nanoseconds use bucket 0, 8-15 nsec use bucket 1 etc.

A number of cpu based spin routines are calibrated against the
timecounter for various argument values and plugged into the array
accordingly.

The array takes up 9000 bytes on 32 bit and 17000 on 64 bit.  This
can be reduced at the cost of reduced precision in nanodelay(), we
need to determine the correct tradeoff there.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.