From owner-freebsd-current@FreeBSD.ORG  Wed Sep  9 01:10:10 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E0250106566C;
	Wed,  9 Sep 2009 01:10:10 +0000 (UTC)
	(envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
	by mx1.freebsd.org (Postfix) with ESMTP id A59D98FC19;
	Wed,  9 Sep 2009 01:10:10 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
	id ECCB77310A; Wed,  9 Sep 2009 03:16:06 +0200 (CEST)
Date: Wed, 9 Sep 2009 03:16:06 +0200
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: current@freebsd.org, jeff@freebsd.org, re@freebsd.org
Message-ID: <20090909011606.GA77031@onelab2.iet.unipi.it>
References: <20090909010137.GA74897@onelab2.iet.unipi.it>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20090909010137.GA74897@onelab2.iet.unipi.it>
User-Agent: Mutt/1.4.2.3i
Cc: 
Subject: Re: clock error: callouts are run one tick late
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Sep 2009 01:10:11 -0000

On Wed, Sep 09, 2009 at 03:01:37AM +0200, Luigi Rizzo wrote:
> RELENG_8/amd64 (can not try on i386) has the following problem:
> 
> 	callout_reset(..., t, ...)
> 
> processes the callout after t+1 ticks instead of the t ticks
> that one sees on RELENG_7 and earlier.
> 
> I found it by pure chance, noticing that dummynet on RELENG_8
> has a jitter that is two ticks instead of one tick.
> Other systems with rely on frequent callouts might see
> problems as well.
> 
> An indirect way to see the problem is the following:
> 
> 	kldload dummynet
> 
> 	sysctl net.inet.ip.dummynet.tick_adjustment; \
> 	sleep 1; sysctl net.inet.ip.dummynet.tick_adjustment
> 
> on a working system, the variable should remain mostly unchanged;
> on a non-working system you see it growing at a rate HZ/2
> 
> I suspect the bug is introduced by the change in kern_timeout.c rev. 1.114
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_timeout.c.diff?r1=1.113;r2=1.114
> 
> which changes softclock() to stop before the current 'ticks'
> so processes everything one tick late.
> 
> I understand the race described in the cvs log, but this does
> not seem a proper fix -- it violates POLA by changing the semantics
> of callout_reset(), and does not really fix the race, but just adds
> an extra uncertainty of 1 tick on when a given callout will be run
> 
> If the problem is a race between hardclock() which updates 'ticks',
> and the various hardclock_cpu() instances which might arrive early,
> I would suggest two alternative options:
> 
> 1. create a per-cpu 'ticks' (say a field cc_ticks in struct callout_cpu),
>    increment it at the start of hardclock_cpu, and use cc->ticks
>    instead of 'ticks' in the various callout functions involved
>    with manipulation of the callwheel
>    callout_tick(), softclock(), callout_reset_on()
> 
> 2. start softclock() at cc->cc_softticks -1, i.e.
> 
> 	...
> 	CC_LOCK(cc)
>    -	while (cc->cc_softticks != ticks) {
>    +	while (cc->cc_softticks-1 != ticks) {
>  	...

#2 also need this change in callout_tick()

        mtx_lock_spin_flags(&cc->cc_lock, MTX_QUIET);
     -  for (; (cc->cc_softticks - ticks) < 0; cc->cc_softticks++) {
     +  for (; (cc->cc_softticks - ticks) <= 0; cc->cc_softticks++) {
                bucket = cc->cc_softticks & callwheelmask;

Just tested it, it seems to fix the problem.

cheers
luigi