Date: Wed, 9 Sep 2009 03:01:37 +0200 From: Luigi Rizzo <rizzo@iet.unipi.it> To: current@freebsd.org, jeff@freebsd.org, re@freebsd.org Subject: clock error: callouts are run one tick late Message-ID: <20090909010137.GA74897@onelab2.iet.unipi.it>
next in thread | raw e-mail | index | archive | help
RELENG_8/amd64 (can not try on i386) has the following problem: callout_reset(..., t, ...) processes the callout after t+1 ticks instead of the t ticks that one sees on RELENG_7 and earlier. I found it by pure chance, noticing that dummynet on RELENG_8 has a jitter that is two ticks instead of one tick. Other systems with rely on frequent callouts might see problems as well. An indirect way to see the problem is the following: kldload dummynet sysctl net.inet.ip.dummynet.tick_adjustment; \ sleep 1; sysctl net.inet.ip.dummynet.tick_adjustment on a working system, the variable should remain mostly unchanged; on a non-working system you see it growing at a rate HZ/2 I suspect the bug is introduced by the change in kern_timeout.c rev. 1.114 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_timeout.c.diff?r1=1.113;r2=1.114 which changes softclock() to stop before the current 'ticks' so processes everything one tick late. I understand the race described in the cvs log, but this does not seem a proper fix -- it violates POLA by changing the semantics of callout_reset(), and does not really fix the race, but just adds an extra uncertainty of 1 tick on when a given callout will be run If the problem is a race between hardclock() which updates 'ticks', and the various hardclock_cpu() instances which might arrive early, I would suggest two alternative options: 1. create a per-cpu 'ticks' (say a field cc_ticks in struct callout_cpu), increment it at the start of hardclock_cpu, and use cc->ticks instead of 'ticks' in the various callout functions involved with manipulation of the callwheel callout_tick(), softclock(), callout_reset_on() 2. start softclock() at cc->cc_softticks -1, i.e. ... CC_LOCK(cc) - while (cc->cc_softticks != ticks) { + while (cc->cc_softticks-1 != ticks) { ... so we can catch up any work leftover due to the race. I think option #1 is preferable as it should completely remove the race condition, though #2 might be reasonable as a quick fix should we decide to add it to 8.0 cheers luigi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090909010137.GA74897>