From owner-freebsd-arch@FreeBSD.ORG  Thu Mar  1 05:42:48 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2C4E0106564A
	for <arch@FreeBSD.org>; Thu,  1 Mar 2012 05:42:48 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by mx1.freebsd.org (Postfix) with ESMTP id BACF68FC14
	for <arch@FreeBSD.org>; Thu,  1 Mar 2012 05:42:47 +0000 (UTC)
Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au
	(c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q215gD7w009742
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 1 Mar 2012 16:42:44 +1100
Date: Thu, 1 Mar 2012 16:42:13 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20120301143042.F2406@besplex.bde.org>
Message-ID: <20120301161011.A2654@besplex.bde.org>
References: <20120229194042.GA10921@onelab2.iet.unipi.it>
	<20120301071145.O879@besplex.bde.org>
	<20120301012315.GB14508@onelab2.iet.unipi.it>
	<20120301132806.O2255@besplex.bde.org>
	<20120301143042.F2406@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@FreeBSD.org
Subject: Re: select/poll/usleep precision on FreeBSD vs Linux vs OSX
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Mar 2012 05:42:48 -0000

On Thu, 1 Mar 2012, Bruce Evans wrote:

> On Thu, 1 Mar 2012, Bruce Evans wrote:
>
>> ...
>> Bakul Shah confirmed that Linux now reprograms the timer.  It has to,
>> for a tickless kernel.  FreeBSD reprograms timers too.  I think you
>> can set HZ large and only get timeout interrupts at that frequency if
>> there are active timeouts that need them.  Timeout granularity is still
>> 1/HZ.
>
> I tried this in -current and in a 2008 -current with hz=10000.  It worked
> mediocrely:
> - the 2008 version gave lapic cpuN: timer interrupts on all CPUs at
>  frequency of almost exactly 10 kHz.  This is the behaviour before
>  FreeBSD reprogrammed timers (except the frequency is often off by
>  as much as 10% due to calibration bugs).  There were many anomolies
>  in the results from the test program (like select() adding 199 usec
>  and usleep() adding 999 usec).
> - [... no surprises in -current]

I tried this in -current with hz=100000.  This gives (some not very
surprising) behaviour:
- systat claims ~100% idle, but the ~100k interrupts on 1 CPU actually
   reduces performance by 33% (two CPUs take 30 seconds user time to
   do what can be done in 20 seconds user time with hz=100).  This is
   a normal problem with fast interrupt handlers.  They need a faster
   interrupt handler to account for them properly.
- ./prog 1 select works reasonably.  It reports timeouts of 29-30 us.
   I expected 19-20.
- ./prog 1 poll is broken as we know.  It asks for timeouts of 0 and
   takes 3 us.
- ./prog 1 usleep shows brokenness.  It reports timeouts of 999 us.
   I think this is due to getnanouptime()'s brokenness.
   $(sysctl kern.timecounter.tick) is 100.  This reduces getnanouptime()'s
   accuracy back to to 1 msec, which explains the 999 us.  But why doesn't
   select() have the same problem?  select() uses getmicrouptime(), but
   it has the same brokenness.  The sysctl is r/o, so I couldn't use
   it easily.  I have changed tc_tick using ddb before, but don't want
   to risk reducing it by a factor of 100.  The timecounter update
   algorithm depends on the timehands not being recycled too fast, and
   probably couldn't copy with recycling 100 times faster.
- ./prog 1000 select and ./prog 1000 poll take 20 us extra.  I expected
   9-10 extra.
- ./prog 1000 usleep takes 619-693 us extra.  Not the full extra 100
   ticks from getnanouptime() fuzziness now.
- ./prog 500000 usleep takes 500026-500885 us.  Even higher variance
   which agrees with the fuzziness better.  select and poll with this
   timeout still have accuracy and low variance (21-26 us extra).

The fuzzy versions are actually useful for optimization after all:
- for long timeouts, use the fuzzy versions and accept their inaccuracies.
   Sleep longer by the amount fuzziness so that sleeps are never too
   short.
- for short timeouts, it seems necessary for the initial timestamp to
   be accuarate.  When checking if the timeout has expired, first try a
   fuzzy check.  This is sufficent if the current fuzzy time is far from
   the expiry time.

Bruce