From owner-freebsd-stable@FreeBSD.ORG Mon Sep 21 19:12:14 2009 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 37615106566B for ; Mon, 21 Sep 2009 19:12:14 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id BF1FC8FC0A for ; Mon, 21 Sep 2009 19:12:11 +0000 (UTC) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.2/8.14.1) with ESMTP id n8LIxx4P028785; Mon, 21 Sep 2009 11:59:59 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.14.2/8.13.4/Submit) id n8LIxxZv028784; Mon, 21 Sep 2009 11:59:59 -0700 (PDT) Date: Mon, 21 Sep 2009 11:59:59 -0700 (PDT) From: Matthew Dillon Message-Id: <200909211859.n8LIxxZv028784@apollo.backplane.com> To: stable@freebsd.org, Peter Wemm References: <20090906155154.GA8283@onelab2.iet.unipi.it> <20090907072159.GA18906@onelab2.iet.unipi.it> <6F002A04-5CF9-466F-AEFB-6B983C0E1980@mac.com> Cc: Subject: Re: incorrect usleep/select delays with HZ > 2500 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Sep 2009 19:12:14 -0000 What we wound up doing was splitting tvtohz() into two functions. tvtohz_high(tv) Returned value meets or exceeds requested time. A minimum value of 1 is returned (really only for {0,0}.. else minimum value is 2). tvtohz_low(tv) Returned value might be shorter then requested time, and 0 can be returned. Most kernel functions use the tvtohz_high() function. Only a few use tvtohz_low(). I have not found any 'good' solution to the problem. For example, average-up errors can mount up when using the results to control a callout timer resulting in much longer delays then originally intended, and similarly same-tick interrupts (e.g. a value of 1) can create much shorter delays then expected. Sometimes one cares more about the average interval being correct, other times the time must not be allowed to be too short. You lose no matter what you choose. http://fxr.watson.org/fxr/source/kern/kern_clock.c?v=DFBSD If you look at tvtohz_high() you will note that the minimum value of 1 is only returned if the passed tv is essentially {0,0}. i.e. 0uS. 1uS == 2 ticks (((us + (tick - 1)) / tick) + 1). The 'tick' global here is the number of uS per tick (not to be confused with 'ticks'). Because of all of that I decided to split the function to make the requirements more apparent. -- The nanosleep() work is a different issue... that's for userland calls (primarily the libc usleep() function). We found that some linux programs assumed that nanosleep() was far more fine-grained then (hz) and, anyway, the system call is called 'nanosleep' and 'usleep' which kind of implies a fine-grained sleep, so we turned it into one when small time intervals were being requested. http://fxr.watson.org/fxr/source/kern/kern_time.c?v=DFBSD The way I figure it if a userland program wants to make system calls with fine-grained sleeps that are too small, it's really no different from treating that program as being cpu-bound anyway so why not try to accomodate it? -- The 8254 issue is more one of a lack of interest in fixing it. Basically using the 8254 as a measure of realtime when the reload value is set to small (i.e. high hz) will always lead to serious timing problems. The reason there is such a lack of interest in fixing it is that most machines have other timers available (lapic, acpi, hpet, tsc, etc). A secondary issue might be tying real-time functions to 'ticks', which could still be driven by the 8254 interrupt.... those have to be divorced from ticks. I'm not sure if FreeBSD has any of those left (does date still skip quickly if hz is set ultra-high? Even when other timers are available?). I will note that tying real-time functions to the hz-based tick function (which is also the 8254-driven problem when other timers are not available) leads to serious problems, particularly with ntpd, even if you only lose track of the full cycle of the timer occassionally. However, neither do you want to 'skip' the ticks value to catch up to a lost interrupt. That will mess up tsleep() and other hz-based timeouts that assume that values of '2' will not instantly timeout. So actual realtime operations really do have to be completely divorced from the hz-based ticks counter and it must only be used for looser timing needs such as protocol timeouts and sleeps. -Matt