From owner-freebsd-arch@FreeBSD.ORG Thu Mar 1 03:14:18 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 98566106564A for ; Thu, 1 Mar 2012 03:14:18 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail28.syd.optusnet.com.au (mail28.syd.optusnet.com.au [211.29.133.169]) by mx1.freebsd.org (Postfix) with ESMTP id 37B998FC0A for ; Thu, 1 Mar 2012 03:14:17 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail28.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q213EEKr031705 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 1 Mar 2012 14:14:15 +1100 Date: Thu, 1 Mar 2012 14:14:14 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Luigi Rizzo In-Reply-To: <20120301012315.GB14508@onelab2.iet.unipi.it> Message-ID: <20120301132806.O2255@besplex.bde.org> References: <20120229194042.GA10921@onelab2.iet.unipi.it> <20120301071145.O879@besplex.bde.org> <20120301012315.GB14508@onelab2.iet.unipi.it> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: select/poll/usleep precision on FreeBSD vs Linux vs OSX X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Mar 2012 03:14:18 -0000 On Thu, 1 Mar 2012, Luigi Rizzo wrote: > On Thu, Mar 01, 2012 at 11:33:46AM +1100, Bruce Evans wrote: >> On Wed, 29 Feb 2012, Luigi Rizzo wrote: >>> | Actual timeout >>> | select | poll | usleep| >>> timeout | FBSD | Linux | OSX | FBSD | FBSD | >>> usec | 9.0 | Vbox | 10.6 | 9.0 | 9.0 | >>> --------+-------+-------+--------+-------+-------+ >>> 1 2000 99 6 0 2000 >>> 10 2000 109 15 0 2000 >>> 50 2000 149 66 0 2000 >>> 100 2000 196 133 0 2000 >>> 500 2000 597 617 0 2000 >>> 1000 2000 1103 1136 2000 2000 >>> 1001 3000 1103 1136 2000 3000 <--- >>> 1500 3000 1608 1631 2000 3000 <--- >>> 2000 3000 2096 2127 3000 3000 >>> 2001 4000 3000 4000 <--- >>> 3001 5000 4000 5000 <--- >>> >>> Note how the rounding (poll has the timeout in milliseconds) affects >> >> You must have synced with timer interrupts to get the above. Timeouts > > yes i have -- the test code does almost nothing after returning from > a select, on a system that does some amount of work times could be > up to 1000us shorter. Still a huge error on short timeouts. I get the sync but not the rounded timeouts, on my ~5.2 kernel with HZ = 100. The times are typically 19900-19993 for rounding up 1 us to 2 ticks. > I should also comment that these are average values on an otherwise > idle system -- i will try to post a histogram of the actual values, > it might well be that osx and linux have quantized values very > different from the average (though this would violate the specs, > so i suspect instead that they have some cheap one-shot timers). > > For FreeBSD I have also rounded the bsd values (actual averages are -1/+3us > over 1sec experiments). Oh. The jitter is of minor interest, and rounding to usec should show an average of slightly less than the timeout rounded up to ticks (on an unloaded system). Bakul Shah confirmed that Linux now reprograms the timer. It has to, for a tickless kernel. FreeBSD reprograms timers too. I think you can set HZ large and only get timeout interrupts at that frequency if there are active timeouts that need them. Timeout granularity is still 1/HZ. Hmm, this may explain why you are getting exact n000's -- every time you ask for a timeout, you get one n000 us later (on a near-idle machine where nothing else is asking for many timeouts), while old kernels give timeouts on perfectly periodic n000(+error) boundaries; now when the syscall is made just after a boundary, the boundary for the timeout is never a full n000 away. There may be a lot of jitter for both, but if the reprogramming of the timer when you ask for a new timeout is too smart, then the jitter will average out to 0, giving perfect n000's. Try running multiple sources of new timeouts. I think a periodic itimer should produce perfectly periodic ones with little overhead. Then other timeouts should not change the periodicity or even reprogram the timer. Reprogramming on demand seems to give unwanted aperiodicity: you ask for a delay of 1 and get 2000. Suppose you actually want 2000, and actually get it relative to the request time. Then the timer must be interrupting aperiodically, with an average period of 2000+(overhead time of say 2) possibly with large jitter. So 500 of these take 1 second plus 1000 us, plus any jitter (the jitter may be negative, but is most likely positive, since when the process setting up the timeouts is preempted and nothing else is setting them up, there may be a large additional delay). I try to avoid this problem in my version of ping. I try to send a packet on every 1 second boundary. Normal ping tries to send one 1 second after the previous one, but it can't do this since it has overheads and gets preempted. With HZ=100 and rounding up and adding 1, the drift is likely to be 20 msec every second or 2%. This is quite a lot. My version tries to schedule a timeout that expires exactly 1 second after the previous packet was sent, not 1 second after the current time. It takes a simple subtraction to determine the timeout to reach the next seconds boundary, but determining the times to subtract seems to require an extra gettimeofday() call. I should use a periodic itimer and depend on it actually being periodic. The kernel must do similar things to keep periodic itimers actually periodic after it reprograms timers. There may be a lot of jitter on each reprogramming, but this can be compensated for on average. OTOH, as for skewing clocks, the compensation shouldn't go too fast in either direction. This could get complicated. I don't know what -current actually does. Bruce