From owner-freebsd-stable@FreeBSD.ORG Mon Sep 7 22:09:01 2009 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2FA85106566B for ; Mon, 7 Sep 2009 22:09:01 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from asmtpout012.mac.com (asmtpout012.mac.com [17.148.16.87]) by mx1.freebsd.org (Postfix) with ESMTP id 1ABE28FC0C for ; Mon, 7 Sep 2009 22:09:00 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Received: from cswiger1.apple.com ([17.227.140.124]) by asmtp012.mac.com (Sun Java(tm) System Messaging Server 6.3-8.01 (built Dec 16 2008; 32bit)) with ESMTPSA id <0KPM007LMDE4GV90@asmtp012.mac.com> for stable@freebsd.org; Mon, 07 Sep 2009 14:08:29 -0700 (PDT) Message-id: <6F002A04-5CF9-466F-AEFB-6B983C0E1980@mac.com> From: Chuck Swiger To: Luigi Rizzo In-reply-to: <20090907072159.GA18906@onelab2.iet.unipi.it> Date: Mon, 07 Sep 2009 14:08:28 -0700 References: <20090906155154.GA8283@onelab2.iet.unipi.it> <20090907072159.GA18906@onelab2.iet.unipi.it> X-Mailer: Apple Mail (2.936) Cc: stable@freebsd.org, Peter Wemm Subject: Re: incorrect usleep/select delays with HZ > 2500 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Sep 2009 22:09:01 -0000 Hi, all-- On Sep 7, 2009, at 12:21 AM, Luigi Rizzo wrote: >>> 3. ?CAUSE an error in tvtohz(), reported long ago in >>> ? ? ? ?http://www.dragonflybsd.org/presentations/nanosleep/ >>> ? ? ? ?which causes a systematic error of an extra tick in the >>> computation >>> ? ? ? ?of the sleep times. >>> ? ?FIX: the above link also contains a proposed fix (which in fact >>> ? ? ? ?reverts a bug introduced in some old commit on FreeBSD) >>> ? ?PROBLEM: none that i see. >> >> This change, as-is, is extremely dangerous. tsleep/msleep() use a >> value of 0 meaning 'forever'. Changing tvtohz() so that it can now >> return 0 for a non-zero tv is setting land-mines all over the place. >> There's something like 27 callers of tvtohz() in sys/kern alone, some >> of which are used to supply tsleep/msleep() timeouts. Note that the >> dragonflybsd patch above only adds the 'returns 0' check to one >> single >> caller. You either need to patch all callers of tvtohz() since >> you've >> change the semantics, or add a 'if (ticks == 0) ticks = 1' check (or >> checks) in the appropriate places inside tvtohz(). >> >> If you don't do it, then you end up with callers of random functions >> with very small timeouts instead finding themselves sleeping forever. > > You are right, a proper fix for this third issue should be different > (if we want a fix at all -- I'd be almost satisfied by just > removing the drift). A while ago, I took a fairly close look at timing accuracy (ie, trying to sleep until exactly the next second or whatever, and then comparing the actual time you were woken up against the target time), did my own fancy graphs showing sawtooth waves indicating timer aliasing problems, etc. There's no question that the system is allowed to sleep longer than requested, and, if the system is busy doing other tasks, some amount of extra delay is expected and OK, but missing the target time by more than several ticks, especially if the system is idle, is bogus. The Dragonfly patch to tvtohz() definitely improves timer accuracy, but it ends up computing a tick value which is too small by one sometimes (ie, the target time lands *within* that scheduler quantum, rather than at or before the start of that tick), causing negative offsets from the desired time (ie, you woke up just a bit too soon). That's also bogus. :-) From my own testing of various platforms, as well as the results reported elsewhere, MacOS X has the most consistent timer accuracy, which is probably not too surprising given that soft realtime capabilities for audio/visual processing tasks such as iTunes, Quicktime, LogicStudio, etc are pretty important to the Mac userbase. I just compared the tvtohz() implementations in kern/kern_clock.c between FreeBSD and Darwin/XNU (specifically xnu-1228.15.4, the kernel for 10.5.8), and they are effectively identical. However, that's probably moot because Darwin uses tvtoabstime() rather than tvtohz() for the callouts in the setitimer()/getitimer() implementation from kern/kern_time.c: /* FreeBSD */ int kern_setitimer(struct thread *td, u_int which, struct itimerval *aitv, struct itimerval *oitv) { [ ... ] if (which == ITIMER_REAL) { PROC_LOCK(p); if (timevalisset(&p->p_realtimer.it_value)) callout_stop(&p->p_itcallout); getmicrouptime(&ctv); if (timevalisset(&aitv->it_value)) { callout_reset(&p->p_itcallout, tvtohz(&aitv- >it_value), realitexpire, p); timevaladd(&aitv->it_value, &ctv); } *oitv = p->p_realtimer; p->p_realtimer = *aitv; PROC_UNLOCK(p); if (timevalisset(&oitv->it_value)) { if (timevalcmp(&oitv->it_value, &ctv, <)) timevalclear(&oitv->it_value); else timevalsub(&oitv->it_value, &ctv); } [ ... ] /* Darwin */ int setitimer(struct proc *p, struct setitimer_args *uap, register_t *retval) { [ ... ] switch (uap->which) { case ITIMER_REAL: proc_spinlock(p); if (timerisset(&aitv.it_value)) { microuptime(&p->p_rtime); timevaladd(&p->p_rtime, &aitv.it_value); p->p_realtimer = aitv; if (!thread_call_enter_delayed(p->p_rcall, tvtoabstime(&p->p_rtime))) // [1] p->p_ractive++; } else { timerclear(&p->p_rtime); p->p_realtimer = aitv; if (thread_call_cancel(p->p_rcall)) p->p_ractive--; } proc_spinunlock(p); break; [ ... ] > The simplest option is perhaps to compute a custom value for > nanosleep, select and poll. This would remove the risk of side > effects to other parts of the system, and we could also use the > chance to compensate for the errors that arise when hz*tick != > 1000000 or when we know that hardclock does not run exactly every > 'tick' (an integer) microseconds. I think it would be relatively simple to adapt the DragonFly change but ensure that tztohz() returns at least 1, and that would be helpful improvement. However, in the long run, a callout mechanism based upon scheduler ticks rather than actual time values suffers an inherent loss of accuracy (you're performing an integer division of time intervals by HZ and rounding up to use the API, after all) and makes assumptions that the scheduler's clock is always running at a constant frequency such that tick * HZ is exactly a million microseconds. Regards, -- -Chuck [1]: See http://www.opensource.apple.com/source/xnu/xnu-1228.15.4/osfmk/kern/thread_call.c