Date: Wed, 21 Jul 2004 23:01:20 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Brian Fundakowski Feldman <green@freebsd.org> Cc: current@freebsd.org Subject: Re: nanosleep returning early Message-ID: <20040721220405.Y2346@epsplex.bde.org> In-Reply-To: <20040721102620.GF1009@green.homeunix.org> References: <20040721081310.GJ22160@freebsd3.cimlogic.com.au> <20040721102620.GF1009@green.homeunix.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 21 Jul 2004, Brian Fundakowski Feldman wrote: > On Wed, Jul 21, 2004 at 06:13:10PM +1000, John Birrell wrote: > > > > Today I increased HZ in a current kernel to 1000 when adding dummynet. > > Now I find that nanosleep regularly comes back a little early. > > Can anyone explain why? The most obvious bug is that nanosleep() uses the low-accuracy interface getnanouptime(). I can't see why the the problem is more obvious with large HZ or why it affects short sleeps. From kern_time.c 1.170: % static int % nanosleep1(struct thread *td, struct timespec *rqt, struct timespec *rmt) % { % struct timespec ts, ts2, ts3; % struct timeval tv; % int error; % % if (rqt->tv_nsec < 0 || rqt->tv_nsec >= 1000000000) % return (EINVAL); % if (rqt->tv_sec < 0 || (rqt->tv_sec == 0 && rqt->tv_nsec == 0)) % return (0); % getnanouptime(&ts); This may lag the actual (up)time by 1/HZ seconds. % timespecadd(&ts, rqt); So we get a final time that may be 1/HZ seconds too small. % TIMESPEC_TO_TIMEVAL(&tv, rqt); Rounding to microseconds doesn't make much difference since things take on the order of 1uS. % for (;;) { % error = tsleep(&nanowait, PWAIT | PCATCH, "nanslp", % tvtohz(&tv)); We only converted to a timeval so that we could use tvtohz() here. tvtohz() rounds up to the tick boundary after the next one to allow for the 1/HZ resolution of tsleep(). This should also mask the inaccuracy of getnanouptime() unless the sleep returns early due to a signal -- in the usual case of not very long sleeps that are not killed by a signal, the tsleep() guarantees sleeping long enough and sleeps (1/2*1/HZ) extra on average. % getnanouptime(&ts2); % if (error != EWOULDBLOCK) { % if (error == ERESTART) % error = EINTR; % if (rmt != NULL) { % timespecsub(&ts, &ts2); % if (ts.tv_sec < 0) % timespecclear(&ts); % *rmt = ts; This handles the case of being killed by a signal. Then we always return early, and the bug is just that returned time-not-slept is innacurate. % } % return (error); % } % if (timespeccmp(&ts2, &ts, >=)) % return (0); This handles the case where the timeout expires. We check that the specified sleep time has expired, not just that some number of ticks expired, since the latter may be too short for long sleeps even after rounding it up. % ts3 = ts; % timespecsub(&ts3, &ts2); % TIMESPEC_TO_TIMEVAL(&tv, &ts3); Errors may accumulate (or cancel?) for the next iteration. % } % } > > I would have expected that the *overrun* beyond the required time to vary, > > but never that it would come back early. > > Is this a difference from clock_gettime(CLOCK_MONOTONIC)? You really > shouldn't be using gettimeofday() foor internal timing since the > system clock can be adjusted by NTP. The monotonic clock can also be adjusted by NTP, and normally is if there are any NTP adjustments at all (the uptime and the time use the same timecounter which is adjusted by NTP). NTP's adjustments are only limited to CLOCK_REALTIME when NTP steps the clock for initialization. Stepping the clock causes other time warps and should never be used. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040721220405.Y2346>