Date: Mon, 07 Sep 2009 14:08:28 -0700 From: Chuck Swiger <cswiger@mac.com> To: Luigi Rizzo <rizzo@iet.unipi.it> Cc: stable@freebsd.org, Peter Wemm <peter@wemm.org> Subject: Re: incorrect usleep/select delays with HZ > 2500 Message-ID: <6F002A04-5CF9-466F-AEFB-6B983C0E1980@mac.com> In-Reply-To: <20090907072159.GA18906@onelab2.iet.unipi.it> References: <20090906155154.GA8283@onelab2.iet.unipi.it> <e7db6d980909061736p4affc054k3fa5070214adc2f8@mail.gmail.com> <20090907072159.GA18906@onelab2.iet.unipi.it>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, all-- On Sep 7, 2009, at 12:21 AM, Luigi Rizzo wrote: >>> 3. ?CAUSE an error in tvtohz(), reported long ago in >>> ? ? ? ?http://www.dragonflybsd.org/presentations/nanosleep/ >>> ? ? ? ?which causes a systematic error of an extra tick in the >>> computation >>> ? ? ? ?of the sleep times. >>> ? ?FIX: the above link also contains a proposed fix (which in fact >>> ? ? ? ?reverts a bug introduced in some old commit on FreeBSD) >>> ? ?PROBLEM: none that i see. >> >> This change, as-is, is extremely dangerous. tsleep/msleep() use a >> value of 0 meaning 'forever'. Changing tvtohz() so that it can now >> return 0 for a non-zero tv is setting land-mines all over the place. >> There's something like 27 callers of tvtohz() in sys/kern alone, some >> of which are used to supply tsleep/msleep() timeouts. Note that the >> dragonflybsd patch above only adds the 'returns 0' check to one >> single >> caller. You either need to patch all callers of tvtohz() since >> you've >> change the semantics, or add a 'if (ticks == 0) ticks = 1' check (or >> checks) in the appropriate places inside tvtohz(). >> >> If you don't do it, then you end up with callers of random functions >> with very small timeouts instead finding themselves sleeping forever. > > You are right, a proper fix for this third issue should be different > (if we want a fix at all -- I'd be almost satisfied by just > removing the drift). A while ago, I took a fairly close look at timing accuracy (ie, trying to sleep until exactly the next second or whatever, and then comparing the actual time you were woken up against the target time), did my own fancy graphs showing sawtooth waves indicating timer aliasing problems, etc. There's no question that the system is allowed to sleep longer than requested, and, if the system is busy doing other tasks, some amount of extra delay is expected and OK, but missing the target time by more than several ticks, especially if the system is idle, is bogus. The Dragonfly patch to tvtohz() definitely improves timer accuracy, but it ends up computing a tick value which is too small by one sometimes (ie, the target time lands *within* that scheduler quantum, rather than at or before the start of that tick), causing negative offsets from the desired time (ie, you woke up just a bit too soon). That's also bogus. :-) From my own testing of various platforms, as well as the results reported elsewhere, MacOS X has the most consistent timer accuracy, which is probably not too surprising given that soft realtime capabilities for audio/visual processing tasks such as iTunes, Quicktime, LogicStudio, etc are pretty important to the Mac userbase. I just compared the tvtohz() implementations in kern/kern_clock.c between FreeBSD and Darwin/XNU (specifically xnu-1228.15.4, the kernel for 10.5.8), and they are effectively identical. However, that's probably moot because Darwin uses tvtoabstime() rather than tvtohz() for the callouts in the setitimer()/getitimer() implementation from kern/kern_time.c: /* FreeBSD */ int kern_setitimer(struct thread *td, u_int which, struct itimerval *aitv, struct itimerval *oitv) { [ ... ] if (which == ITIMER_REAL) { PROC_LOCK(p); if (timevalisset(&p->p_realtimer.it_value)) callout_stop(&p->p_itcallout); getmicrouptime(&ctv); if (timevalisset(&aitv->it_value)) { callout_reset(&p->p_itcallout, tvtohz(&aitv- >it_value), realitexpire, p); timevaladd(&aitv->it_value, &ctv); } *oitv = p->p_realtimer; p->p_realtimer = *aitv; PROC_UNLOCK(p); if (timevalisset(&oitv->it_value)) { if (timevalcmp(&oitv->it_value, &ctv, <)) timevalclear(&oitv->it_value); else timevalsub(&oitv->it_value, &ctv); } [ ... ] /* Darwin */ int setitimer(struct proc *p, struct setitimer_args *uap, register_t *retval) { [ ... ] switch (uap->which) { case ITIMER_REAL: proc_spinlock(p); if (timerisset(&aitv.it_value)) { microuptime(&p->p_rtime); timevaladd(&p->p_rtime, &aitv.it_value); p->p_realtimer = aitv; if (!thread_call_enter_delayed(p->p_rcall, tvtoabstime(&p->p_rtime))) // [1] p->p_ractive++; } else { timerclear(&p->p_rtime); p->p_realtimer = aitv; if (thread_call_cancel(p->p_rcall)) p->p_ractive--; } proc_spinunlock(p); break; [ ... ] > The simplest option is perhaps to compute a custom value for > nanosleep, select and poll. This would remove the risk of side > effects to other parts of the system, and we could also use the > chance to compensate for the errors that arise when hz*tick != > 1000000 or when we know that hardclock does not run exactly every > 'tick' (an integer) microseconds. I think it would be relatively simple to adapt the DragonFly change but ensure that tztohz() returns at least 1, and that would be helpful improvement. However, in the long run, a callout mechanism based upon scheduler ticks rather than actual time values suffers an inherent loss of accuracy (you're performing an integer division of time intervals by HZ and rounding up to use the API, after all) and makes assumptions that the scheduler's clock is always running at a constant frequency such that tick * HZ is exactly a million microseconds. Regards, -- -Chuck [1]: See http://www.opensource.apple.com/source/xnu/xnu-1228.15.4/osfmk/kern/thread_call.c
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6F002A04-5CF9-466F-AEFB-6B983C0E1980>