From owner-cvs-src@FreeBSD.ORG Mon Jul 31 09:11:09 2006 Return-Path: X-Original-To: cvs-src@FreeBSD.org Delivered-To: cvs-src@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8B85816A4DD; Mon, 31 Jul 2006 09:11:09 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id DB04743D46; Mon, 31 Jul 2006 09:11:08 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.2.163]) by mailout2.pacific.net.au (Postfix) with ESMTP id B60A1189049; Mon, 31 Jul 2006 19:11:07 +1000 (EST) Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k6V9B3bn021139; Mon, 31 Jul 2006 19:11:04 +1000 Date: Mon, 31 Jul 2006 19:11:02 +1000 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: Jung-uk Kim In-Reply-To: <200607251525.11623.jkim@FreeBSD.org> Message-ID: <20060731172935.O923@epsplex.bde.org> References: <200607252001.aa18647@salmon.maths.tcd.ie> <200607251525.11623.jkim@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, "Christian S.J. Peron" , cvs-all@FreeBSD.org, David Malone , Sam Leffler Subject: Re: cvs commit: src/sys/net bpf.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Jul 2006 09:11:09 -0000 On Tue, 25 Jul 2006, Jung-uk Kim wrote: > On Tuesday 25 July 2006 03:01 pm, David Malone wrote: >> >> It sounds to me like a reasonable thing to do would be to pass up >> a raw version of the timestamp (as returned by the hardware). We'd >> also pass up the regular microtime() timestamp. You can then do any >> postprocessing to syncronise timestamps later in userland? > > Nope. In that case, you actually need to export few more things, > i.e., current hardware timecounter value, clock frequency, size of > the timecounter, etc. Even then, it's going to be hard to get > correct timeval without exposing few kernel internals. Synchronization is so hard to do that it is not done even in the kernel where all the variables are directly accessible (modulo locking), even for cases that are much more important. E.g.: - synchronization of TSCs across CPUs. This might require IPIs so it might be very inefficient. If all CPUs are driven by the same hardware clock then they might stay in sync even when the clock is throttled. Then IPIs would not be needed and the synchronization problems reduce to the next one. Otherwise it is difficult to keep the TSCs perfectly in sync even with IPIs and the next problem might need to be solved anyway (to keep the TSCs in sync with something). - synchronization of TSCs (or other efficient but possibly unstable timecounters) with "higher" quality timecounters (ones that are inefficient but possibly more stable). Before timecounters or SMP or much CPU throttling, the i386 TSC was synced with the i8254 on every clock tick. This worked OK, but was missing recalibration of the TSC and smoothing of jumps at sync points, and with CPU throttling recalibration is necessary else the jumps could be very large and remain large. Now there is some synchronization of "cpu ticks" with the active timecounter. This is missing almost the opposite things -- it has recalibration and doesn't need smoothing of jumps only since it doesn't have the jumps necessary for synchronhization. - synchronization of timecounters with themselves. The get*time() functions are not properly synchronized with the non-get versions, although this breaks the "get" versions, because proper synchronization would be less efficent and/or complicated. Synchronization only occurs every few msec in tc_windup(), but this is not enough for proper synchronization. E.g., timestamps made using time_second (as most file systems do) can be more that 1 second in the past relative to the current time, since updates of time_second are normally delayed by several msec. Userland can see this bug using code like "now = time(NULL); utimes(file, NULL); stat(file, &sb); assert(sb.st_mtime >= now);" -- time(3) uses microtime(9) and correctly rounds to seconds, while utimes(2) normally uses time_second which is the current time incorrectly rounded to seconds. I used to fix this in the non-SMP case by syncing time_second and other offsets in every call to a non-get function, using hackish locking that only works in the non-SMP case. >>> Okay. But I am worried about timecounter <-> timeval conversion >>> because I want to know timeval delta from system time, not just >>> some timer value. To get the delta, you would have to read the system time (not using a "get" function) so things might be slower than just reading the system time for everything. I think only cases where the hardware writes timestamps using DMA are interesting (if the timestamps involve bus accesses then they are likely to be slower than ACPI-"fast" ones which are hundreds of times slower than TSC accesses on most systems). Then the timestamps would have been made a relatively long time in the past and you would prefer to know the system time at which they were made, but it is impossible to know that time precisely. It is only possible to compare with the current time. The comparision might not need to be very precise but it should avoid obvious bugs like the ones for file times: now = time(NULL); assert(now >= packettime.tv_sec); Hardware could easily make incoherent timestamps here and then the system shouldn't just blindly convert them into negative deltas, etc. Bruce