Date: Mon, 21 Sep 2009 11:59:59 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: stable@freebsd.org, Peter Wemm <peter@wemm.org> Subject: Re: incorrect usleep/select delays with HZ > 2500 Message-ID: <200909211859.n8LIxxZv028784@apollo.backplane.com> References: <20090906155154.GA8283@onelab2.iet.unipi.it> <e7db6d980909061736p4affc054k3fa5070214adc2f8@mail.gmail.com> <20090907072159.GA18906@onelab2.iet.unipi.it> <6F002A04-5CF9-466F-AEFB-6B983C0E1980@mac.com>
next in thread | previous in thread | raw e-mail | index | archive | help
What we wound up doing was splitting tvtohz() into two functions. tvtohz_high(tv) Returned value meets or exceeds requested time. A minimum value of 1 is returned (really only for {0,0}.. else minimum value is 2). tvtohz_low(tv) Returned value might be shorter then requested time, and 0 can be returned. Most kernel functions use the tvtohz_high() function. Only a few use tvtohz_low(). I have not found any 'good' solution to the problem. For example, average-up errors can mount up when using the results to control a callout timer resulting in much longer delays then originally intended, and similarly same-tick interrupts (e.g. a value of 1) can create much shorter delays then expected. Sometimes one cares more about the average interval being correct, other times the time must not be allowed to be too short. You lose no matter what you choose. http://fxr.watson.org/fxr/source/kern/kern_clock.c?v=DFBSD If you look at tvtohz_high() you will note that the minimum value of 1 is only returned if the passed tv is essentially {0,0}. i.e. 0uS. 1uS == 2 ticks (((us + (tick - 1)) / tick) + 1). The 'tick' global here is the number of uS per tick (not to be confused with 'ticks'). Because of all of that I decided to split the function to make the requirements more apparent. -- The nanosleep() work is a different issue... that's for userland calls (primarily the libc usleep() function). We found that some linux programs assumed that nanosleep() was far more fine-grained then (hz) and, anyway, the system call is called 'nanosleep' and 'usleep' which kind of implies a fine-grained sleep, so we turned it into one when small time intervals were being requested. http://fxr.watson.org/fxr/source/kern/kern_time.c?v=DFBSD The way I figure it if a userland program wants to make system calls with fine-grained sleeps that are too small, it's really no different from treating that program as being cpu-bound anyway so why not try to accomodate it? -- The 8254 issue is more one of a lack of interest in fixing it. Basically using the 8254 as a measure of realtime when the reload value is set to small (i.e. high hz) will always lead to serious timing problems. The reason there is such a lack of interest in fixing it is that most machines have other timers available (lapic, acpi, hpet, tsc, etc). A secondary issue might be tying real-time functions to 'ticks', which could still be driven by the 8254 interrupt.... those have to be divorced from ticks. I'm not sure if FreeBSD has any of those left (does date still skip quickly if hz is set ultra-high? Even when other timers are available?). I will note that tying real-time functions to the hz-based tick function (which is also the 8254-driven problem when other timers are not available) leads to serious problems, particularly with ntpd, even if you only lose track of the full cycle of the timer occassionally. However, neither do you want to 'skip' the ticks value to catch up to a lost interrupt. That will mess up tsleep() and other hz-based timeouts that assume that values of '2' will not instantly timeout. So actual realtime operations really do have to be completely divorced from the hz-based ticks counter and it must only be used for looser timing needs such as protocol timeouts and sleeps. -Matt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200909211859.n8LIxxZv028784>