From owner-cvs-src@FreeBSD.ORG  Mon Jul 31 09:11:09 2006
Return-Path: <owner-cvs-src@FreeBSD.ORG>
X-Original-To: cvs-src@FreeBSD.org
Delivered-To: cvs-src@FreeBSD.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8B85816A4DD;
	Mon, 31 Jul 2006 09:11:09 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.115])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DB04743D46;
	Mon, 31 Jul 2006 09:11:08 +0000 (GMT) (envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.2.163])
	by mailout2.pacific.net.au (Postfix) with ESMTP id B60A1189049;
	Mon, 31 Jul 2006 19:11:07 +1000 (EST)
Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246])
	by mailproxy2.pacific.net.au (8.13.4/8.13.4/Debian-3sarge1) with ESMTP
	id k6V9B3bn021139; Mon, 31 Jul 2006 19:11:04 +1000
Date: Mon, 31 Jul 2006 19:11:02 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@epsplex.bde.org
To: Jung-uk Kim <jkim@FreeBSD.org>
In-Reply-To: <200607251525.11623.jkim@FreeBSD.org>
Message-ID: <20060731172935.O923@epsplex.bde.org>
References: <200607252001.aa18647@salmon.maths.tcd.ie>
	<200607251525.11623.jkim@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: src-committers@FreeBSD.org, cvs-src@FreeBSD.org,
	"Christian S.J. Peron" <csjp@FreeBSD.org>, cvs-all@FreeBSD.org,
	David Malone <dwmalone@maths.tcd.ie>, Sam Leffler <sam@errno.com>
Subject: Re: cvs commit: src/sys/net bpf.c
X-BeenThere: cvs-src@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: CVS commit messages for the src tree <cvs-src.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-src>
List-Post: <mailto:cvs-src@freebsd.org>
List-Help: <mailto:cvs-src-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Jul 2006 09:11:09 -0000

On Tue, 25 Jul 2006, Jung-uk Kim wrote:

> On Tuesday 25 July 2006 03:01 pm, David Malone wrote:
>>
>> It sounds to me like a reasonable thing to do would be to pass up
>> a raw version of the timestamp (as returned by the hardware). We'd
>> also pass up the regular microtime() timestamp. You can then do any
>> postprocessing to syncronise timestamps later in userland?
>
> Nope.  In that case, you actually need to export few more things,
> i.e., current hardware timecounter value, clock frequency, size of
> the timecounter, etc.  Even then, it's going to be hard to get
> correct timeval without exposing few kernel internals.

Synchronization is so hard to do that it is not done even in the kernel
where all the variables are directly accessible (modulo locking), even
for cases that are much more important.  E.g.:
- synchronization of TSCs across CPUs.  This might require IPIs so it
   might be very inefficient.  If all CPUs are driven by the same
   hardware clock then they might stay in sync even when the clock is
   throttled.  Then IPIs would not be needed and the synchronization
   problems reduce to the next one.  Otherwise it is difficult to
   keep the TSCs perfectly in sync even with IPIs and the next problem
   might need to be solved anyway (to keep the TSCs in sync with
   something).
- synchronization of TSCs (or other efficient but possibly unstable
   timecounters) with "higher" quality timecounters (ones that are
   inefficient but possibly more stable).  Before timecounters or SMP
   or much CPU throttling, the i386 TSC was synced with the i8254 on
   every clock tick.  This worked OK, but was missing recalibration of
   the TSC and smoothing of jumps at sync points, and with CPU throttling
   recalibration is necessary else the jumps could be very large and
   remain large.  Now there is some synchronization of "cpu ticks" with
   the active timecounter.  This is missing almost the opposite things
   -- it has recalibration and doesn't need smoothing of jumps only
   since it doesn't have the jumps necessary for synchronhization.
- synchronization of timecounters with themselves.  The get*time()
   functions are not properly synchronized with the non-get versions,
   although this breaks the "get" versions, because proper synchronization
   would be less efficent and/or complicated.  Synchronization only
   occurs every few msec in tc_windup(), but this is not enough for
   proper synchronization.  E.g., timestamps made using time_second (as
   most file systems do) can be more that 1 second in the past relative
   to the current time, since updates of time_second are normally delayed
   by several msec.  Userland can see this bug using code like
   "now = time(NULL); utimes(file, NULL); stat(file, &sb);
   assert(sb.st_mtime >= now);" -- time(3) uses microtime(9) and correctly
   rounds to seconds, while utimes(2) normally uses time_second which
   is the current time incorrectly rounded to seconds.  I used to fix
   this in the non-SMP case by syncing time_second and other offsets
   in every call to a non-get function, using hackish locking that
   only works in the non-SMP case.

>>> Okay.  But I am worried about timecounter <-> timeval conversion
>>> because I want to know timeval delta from system time, not just
>>> some timer value.

To get the delta, you would have to read the system time (not using a
"get" function) so things might be slower than just reading the system
time for everything.  I think only cases where the hardware writes
timestamps using DMA are interesting (if the timestamps involve bus
accesses then they are likely to be slower than ACPI-"fast" ones which
are hundreds of times slower than TSC accesses on most systems).  Then
the timestamps would have been made a relatively long time in the past
and you would prefer to know the system time at which they were made,
but it is impossible to know that time precisely. It is only possible
to compare with the current time.  The comparision might not need to
be very precise but it should avoid obvious bugs like the ones for
file times:

     now = time(NULL); assert(now >= packettime.tv_sec);

Hardware could easily make incoherent timestamps here and then the
system shouldn't just blindly convert them into negative deltas, etc.

Bruce