From owner-freebsd-arch@FreeBSD.ORG Thu Mar 1 04:45:30 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 34542106564A for ; Thu, 1 Mar 2012 04:45:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail30.syd.optusnet.com.au (mail30.syd.optusnet.com.au [211.29.133.193]) by mx1.freebsd.org (Postfix) with ESMTP id C21BB8FC0C for ; Thu, 1 Mar 2012 04:45:29 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail30.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q214jB7Z030524 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 1 Mar 2012 15:45:16 +1100 Date: Thu, 1 Mar 2012 15:45:11 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20120301132806.O2255@besplex.bde.org> Message-ID: <20120301143042.F2406@besplex.bde.org> References: <20120229194042.GA10921@onelab2.iet.unipi.it> <20120301071145.O879@besplex.bde.org> <20120301012315.GB14508@onelab2.iet.unipi.it> <20120301132806.O2255@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: select/poll/usleep precision on FreeBSD vs Linux vs OSX X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Mar 2012 04:45:30 -0000 On Thu, 1 Mar 2012, Bruce Evans wrote: > ... > Bakul Shah confirmed that Linux now reprograms the timer. It has to, > for a tickless kernel. FreeBSD reprograms timers too. I think you > can set HZ large and only get timeout interrupts at that frequency if > there are active timeouts that need them. Timeout granularity is still > 1/HZ. I tried this in -current and in a 2008 -current with hz=10000. It worked mediocrely: - the 2008 version gave lapic cpuN: timer interrupts on all CPUs at frequency of almost exactly 10 kHz. This is the behaviour before FreeBSD reprogrammed timers (except the frequency is often off by as much as 10% due to calibration bugs). There were many anomolies in the results from the test program (like select() adding 199 usec and usleep() adding 999 usec). - current gives cpu0: timer interrupts at a frequency of almost exactly 10115 Hz, but only when I watch it using systat over the network (10000 is Hz and the other 115 is presumaby for reprogramming). The other CPU gets many fewer interrupts. When I stop watching, the rates drop towards 9900 for cpu0 and 120 for cpu1. I hoped that there would be only about 50 timer interrupts on the mostly-idle machine. - timeout granularity according to the test program was better than expected. In almost all cases, the timeout was xx99 us. E.g., 1 becomes 200 after rounding up and adding 1 tick, and the result is 199 (since there was 1 us of overhead and no jitter). 1000 became 1099 since rounding up didn't increase it. This is almost better than the OtherOS results (since it has no jitter). I can probably easily beat OtherOS by setting hz to 100000. But I think no jitter is too good to be good. This makes a design bug in poll() very clear. poll() has a timeout granularity of 1 ms, so you can't even asks for timeouts of less than that. Above 1 ms, the extra 99 or 199 us is good enough, and the default of an extra 999 or 1999 us is not too bad. A tickless kernel should have the equivalent of HZ = 0 on idle machines and the equivalant of HZ = huge when something uses lots of timeouts. The latter gives some security problems. You don't want to reprogram timers ever 500 nsec when some untrusted application asks for timeouts of 1000 nsec even if the system can support it. When APIs are fixed to catch up with 1988's timespecs, it will be possible to ask for timeouts of 1 nsec and never get them but waste a lot of cycles. Scheduling is not good enough to disfavour CPU hogs that do things on the nanoseconds scale. I just remembered that precise timeouts are just what is needed for hiding from schedulers. stathz was supposed to be significantly aperiodic and larger than hz so that CPU hogs couldn't use timeouts (based on hz) to hide from schedulers (based on stathz). This was never fully implemented in FreeBSD, and was broken many years ago. In FreeBSD, stathz was normally 128 and aperiod, and just a little larger than hz which was normally 100. But someone broke hz to default to 1000. CPU hogs can now not so easily hide from schedulers by getting timeouts every millisecond and running for about 6 or 7 milliseconds, then sleeping for 2 or 1 millisecond to miss scheduler ticks. With larger hz, the hogs get more control. E.g., HZ = 10000 lets them sleep for only 200 or 100 usec every 78.1 msec to miss scheduler ticks. Reprogramming of timers in -current probably gives significant jitter to timeout boundaries. This can be handled by sleeping for a slightly wider interval. Also, fine-grained timeouts makes allows simpler implementations of this: just wake up every tick, and if you are close to a scheduler tick (which you can predict since they are periodic), then go back to sleep for 1 timeout tick. Since timeout ticks are short relative to scheduler ticks, you get control again soon and then don't have to sleep again for many timeout ticks. No one cares about this because CPUs are now free :-). -current has related fixes and complications in new timer code. Even without malicious CPU hogs, basing statclock and hardclock on the same lapic timer made them too synchronous with each other. The quick fix was to use the i8254 again. This gave a small amount of asynchronicity which was apparently enough to fix the non- malicious case. I didn't like this, and tried to generate some fake asynchronicity in from a single lapic timer. I think it is possible to fake it well enough for the non-malicious case. No one followed up on this. I haven't followed later developments. Bruce