From owner-freebsd-arch@FreeBSD.ORG  Fri Jun  8 08:03:52 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6235D106566B;
	Fri,  8 Jun 2012 08:03:52 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id E7F178FC0C;
	Fri,  8 Jun 2012 08:03:51 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q5883gvL010688
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 8 Jun 2012 18:03:44 +1000
Date: Fri, 8 Jun 2012 18:03:42 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
In-Reply-To: <20120607091243.GV85127@deviant.kiev.zoral.com.ua>
Message-ID: <20120608174919.S1594@besplex.bde.org>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120606205938.GS85127@deviant.kiev.zoral.com.ua>
	<20120607130029.K1962@besplex.bde.org>
	<20120607091243.GV85127@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jun 2012 08:03:52 -0000

On Thu, 7 Jun 2012, Konstantin Belousov wrote:

> On Thu, Jun 07, 2012 at 01:00:34PM +1000, Bruce Evans wrote:
>>
>> tc_windup()'s close in succession are bugs, since they cycle the timehands
>> faster than they were designed to be.  We already have too many of these
>> bugs (where tc_setclock() calls tc_windup().  I didn't notice this
>> particular problem with it before).  Now I will point out that version
>> 2 of your patch adds more of these calls, apparently to get changes to
>> happen sooner.  But in sysctl_kern_timecounter_hardware(), such a call
>> was intentionaly left out since it is not needed.  Note that tc_tick
>> prevents calls to tc_windup() more often than about once per msec if
>> hz > 1000.
> No, I did not added more tc_windup calls. I added a recalculation
> of the shared page content on the timecounter change, which is not
> the same as tc_windup() call. This is exactly to handle a disable
> of usermode rdtsc use when kernel timecounter hardware changes.

Oops.  I saw a parameter named tc_windup and didn't look too closely
at the event handler for this.  Please use a slightly different name.

Frequent updates of the shared page may cause the same too-fast cycling
as frequent calls to tc_windup().  Are event handlers rate-limited?
If not, then someone changing the timecounter hardware from a loop
in userland could cause similar problems to a settimeofday() loop.
Both are privileged operations so this is not a large problem, but it
is a stress test that should pass.

>>  [jhb wrote]
>>> There was apparently another issue with version 2. The bcopy() is not
>>> atomic, so potentially libc could read wrong tk_current. I redid
>>> the interface to write to the shared page to allow use of real atomics.
>>
>> Timecounter code is supposed to be lock-free except for some time-domain
>> locking.  I only see 1 problem with this: where tc_windup() writes the
>> generation count and other things without asking for these writes to
>> be ordered.  In most cases, the time-domain locking prevents problems.
> In fact, on x86 the ordering is strong enough that no barriers are needed,
> this is why the problem goes unnoticed so far.

Only the x86 write ordering is clearly strong enough (see another reply).

Bruce