From owner-freebsd-stable@FreeBSD.ORG Mon Sep 7 19:52:11 2009 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D560B106566B for ; Mon, 7 Sep 2009 19:52:11 +0000 (UTC) (envelope-from cliftonr@lava.net) Received: from outgoing01.lava.net (cake.lava.net [IPv6:2001:1888:0:1:230:48ff:fe5b:3b50]) by mx1.freebsd.org (Postfix) with ESMTP id 6D52D8FC0C for ; Mon, 7 Sep 2009 19:52:11 +0000 (UTC) Received: from malasada.lava.net (malasada.lava.net [64.65.64.17]) by outgoing01.lava.net (Postfix) with ESMTP id 7CEC4D259B; Mon, 7 Sep 2009 09:52:10 -1000 (HST) Received: by malasada.lava.net (Postfix, from userid 102) id CDAEF153882; Mon, 7 Sep 2009 09:52:07 -1000 (HST) Date: Mon, 7 Sep 2009 09:52:07 -1000 From: Clifton Royston To: Luigi Rizzo Message-ID: <20090907195207.GD10384@lava.net> References: <20090906155154.GA8283@onelab2.iet.unipi.it> <20090907072159.GA18906@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090907072159.GA18906@onelab2.iet.unipi.it> User-Agent: Mutt/1.4.2.2i Cc: stable@freebsd.org, Peter Wemm Subject: Re: incorrect usleep/select delays with HZ > 2500 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Sep 2009 19:52:11 -0000 On Mon, Sep 07, 2009 at 09:21:59AM +0200, Luigi Rizzo wrote: > On Sun, Sep 06, 2009 at 05:36:29PM -0700, Peter Wemm wrote: > > On Sun, Sep 6, 2009 at 8:51 AM, Luigi Rizzo wrote: > > > (this problem seems to affect both current and -stable, > > > so let's see if here i have better luck) > > > > > > I just noticed [Note 1,2] that when setting HZ > 2500 (even if it is > > > an exact divisor of the APIC/CPU clock) there is a significant > > > drift between the delays generated by usleep()/select() and those > > > computed by gettimeofday(). ?In other words, the error grows with > > > the amount of delay requested. > > > > > > To show the problem, try this function > > > > > > ? ? ? ?int f(int wait_time) { ?// wait_time in usec > > > ? ? ? ? ? ? ? ?struct timeval start, end; > > > ? ? ? ? ? ? ? ?gettimeofday(&start); > > > ? ? ? ? ? ? ? ?usleep(w); ? ? ?// or try select > > > ? ? ? ? ? ? ? ?gettimeofday(&end) > > > ? ? ? ? ? ? ? ?timersub(&end, &start, &x); > > > ? ? ? ? ? ? ? ?return = x.tv_usec + 1000000*x.tv_sec - wait_time; > > > ? ? ? ?} > > > > > > for various HZ (kern.hz=NNNN in /boot/loader.conf) and wait times. > > > Ideally, we would expect the timings to be in error by something > > > between 0 and 1 (or 2) ticks, irrespective of the value of wait_time. > > > In fact, this is what you see with HZ=1000, 2000 and 2500. > > > But larger values of HZ (e.g. 4000, 5000, 10k, 40k) create > > > a drift of 0.5% and above (i.e. with HZ=5000, a 1-second delay > > > lasts 1.0064s and a 10s delay lasts 10.062s; with HZ=10k the > > > error becomes 1% and at HZ=40k the error becomes even bigger. > > > > Technically, it isn't even an error because the sleeps are defined as > > 'at least' the value specified. If you're looking for real-time-OS > > level performance, you probably need to look at one. > > I know a non RTOS has limitations on the guarantees it can give. > But with this in mind (e.g. be happy with something that works as > expected just 'most of the times'), on this specific issue FreeBSD > could behave much better while still remaining technically correct. I agree. The best is the enemy of the good; while it will never be an RTOS, there is no reason not to make the behavior as good as possible, especially where a small investment of effort is required. > > > 3. ?CAUSE an error in tvtohz(), reported long ago in > > > ? ? ? ?http://www.dragonflybsd.org/presentations/nanosleep/ > > > ? ? ? ?which causes a systematic error of an extra tick in the computation > > > ? ? ? ?of the sleep times. > > > ? ?FIX: the above link also contains a proposed fix (which in fact > > > ? ? ? ?reverts a bug introduced in some old commit on FreeBSD) > > > ? ?PROBLEM: none that i see. > > > > This change, as-is, is extremely dangerous. tsleep/msleep() use a > > value of 0 meaning 'forever'. Changing tvtohz() so that it can now > > return 0 for a non-zero tv is setting land-mines all over the place. > > There's something like 27 callers of tvtohz() in sys/kern alone, some > > of which are used to supply tsleep/msleep() timeouts. Note that the > > dragonflybsd patch above only adds the 'returns 0' check to one single > > caller. You either need to patch all callers of tvtohz() since you've > > change the semantics, or add a 'if (ticks == 0) ticks = 1' check (or > > checks) in the appropriate places inside tvtohz(). > > > > If you don't do it, then you end up with callers of random functions > > with very small timeouts instead finding themselves sleeping forever. > > You are right, a proper fix for this third issue should be different > (if we want a fix at all -- I'd be almost satisfied by just > removing the drift). Simply returning max(1, ticks) from tvtohz, as suggested above, would eliminate this concern and still improve the performance for a great number of cases. > The simplest option is perhaps to compute a custom value for > nanosleep, select and poll. This would remove the risk of side > effects to other parts of the system, and we could also use the > chance to compensate for the errors that arise when hz*tick != > 1000000 or when we know that hardclock does not run exactly every > 'tick' (an integer) microseconds. This would also be a good solution. It would be perfectly possible to implement it in two stages - first by adapting tvothz() as above, as an immediate and hopefully non-controversial improvement, and second by refactoring the latter proposed change so that the non-sleep users (nanosleep etc.) call a separate routine which is allowed to return 0; tvtohz() could then be replaced with the degenerate function { return (ticks = trueticks(tv) ? ticks : 1; } All IMHO, of course. -- Clifton -- Clifton Royston -- cliftonr@iandicomputing.com / cliftonr@lava.net President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services