Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Sep 2009 03:16:06 +0200
From:      Luigi Rizzo <rizzo@iet.unipi.it>
To:        current@freebsd.org, jeff@freebsd.org, re@freebsd.org
Subject:   Re: clock error: callouts are run one tick late
Message-ID:  <20090909011606.GA77031@onelab2.iet.unipi.it>
In-Reply-To: <20090909010137.GA74897@onelab2.iet.unipi.it>
References:  <20090909010137.GA74897@onelab2.iet.unipi.it>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Sep 09, 2009 at 03:01:37AM +0200, Luigi Rizzo wrote:
> RELENG_8/amd64 (can not try on i386) has the following problem:
> 
> 	callout_reset(..., t, ...)
> 
> processes the callout after t+1 ticks instead of the t ticks
> that one sees on RELENG_7 and earlier.
> 
> I found it by pure chance, noticing that dummynet on RELENG_8
> has a jitter that is two ticks instead of one tick.
> Other systems with rely on frequent callouts might see
> problems as well.
> 
> An indirect way to see the problem is the following:
> 
> 	kldload dummynet
> 
> 	sysctl net.inet.ip.dummynet.tick_adjustment; \
> 	sleep 1; sysctl net.inet.ip.dummynet.tick_adjustment
> 
> on a working system, the variable should remain mostly unchanged;
> on a non-working system you see it growing at a rate HZ/2
> 
> I suspect the bug is introduced by the change in kern_timeout.c rev. 1.114
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_timeout.c.diff?r1=1.113;r2=1.114
> 
> which changes softclock() to stop before the current 'ticks'
> so processes everything one tick late.
> 
> I understand the race described in the cvs log, but this does
> not seem a proper fix -- it violates POLA by changing the semantics
> of callout_reset(), and does not really fix the race, but just adds
> an extra uncertainty of 1 tick on when a given callout will be run
> 
> If the problem is a race between hardclock() which updates 'ticks',
> and the various hardclock_cpu() instances which might arrive early,
> I would suggest two alternative options:
> 
> 1. create a per-cpu 'ticks' (say a field cc_ticks in struct callout_cpu),
>    increment it at the start of hardclock_cpu, and use cc->ticks
>    instead of 'ticks' in the various callout functions involved
>    with manipulation of the callwheel
>    callout_tick(), softclock(), callout_reset_on()
> 
> 2. start softclock() at cc->cc_softticks -1, i.e.
> 
> 	...
> 	CC_LOCK(cc)
>    -	while (cc->cc_softticks != ticks) {
>    +	while (cc->cc_softticks-1 != ticks) {
>  	...

#2 also need this change in callout_tick()

        mtx_lock_spin_flags(&cc->cc_lock, MTX_QUIET);
     -  for (; (cc->cc_softticks - ticks) < 0; cc->cc_softticks++) {
     +  for (; (cc->cc_softticks - ticks) <= 0; cc->cc_softticks++) {
                bucket = cc->cc_softticks & callwheelmask;

Just tested it, it seems to fix the problem.

cheers
luigi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090909011606.GA77031>