From owner-svn-src-all@FreeBSD.ORG  Tue Mar 15 16:32:57 2011
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from [127.0.0.1] (freefall.freebsd.org [IPv6:2001:4f8:fff6::28])
	by hub.freebsd.org (Postfix) with ESMTP id 66527106564A;
	Tue, 15 Mar 2011 16:32:52 +0000 (UTC)
	(envelope-from jkim@FreeBSD.org)
From: Jung-uk Kim <jkim@FreeBSD.org>
To: Bruce Evans <brde@optusnet.com.au>
Date: Tue, 15 Mar 2011 12:32:34 -0400
User-Agent: KMail/1.6.2
References: <201103142205.p2EM5x6E012664@svn.freebsd.org>
	<20110315123819.X920@besplex.bde.org>
In-Reply-To: <20110315123819.X920@besplex.bde.org>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201103151232.35904.jkim@FreeBSD.org>
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
	src-committers@freebsd.org
Subject: Re: svn commit: r219646 - head/sys/x86/isa
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Mar 2011 16:32:57 -0000

On Monday 14 March 2011 10:31 pm, Bruce Evans wrote:
> On Mon, 14 Mar 2011, Jung-uk Kim wrote:
> > Log:
> >  When TSC is unavailable, broken or disabled and the current
> > timecounter has better quality than i8254 timer, use it for
> > DELAY(9).
>
> You cannot use a random timecounter for DELAY().  DELAY() is
> carefully written to not use any locks, so that it doesn't deadlock
> when called in the "any" context from a bad console driver like
> syscons when syscons is called from ddb.  (Bad console drivers may
> have their own locks which deadlock in ddb, but that is another
> problem.)  Even the i8254 timer, which is the only timer that
> should be used by DELAY(), normally use locks that used to cause
> deadlock, but DELAY() was unbroken to bypass the locks when it is
> called from ddb.

I disagree.  I think we should keep away from i8254 as much as 
possible.  Actually, i8254 is the only timer that requires locking in 
the first place because of its braindead design.  All other x86 
timers do not require any locking at all.  We even sacrificed upper 
32 bits of HPET in favor of i386. :-(

> Cpufreq and other calibration code should use a normal timecounter
> via nanouptime() in all cases and never go near low level
> timecounters or DELAY().  There are (hopefully minor) problems
> getting the nanotime() initialized before it used.  The dummy
> timecounter is just what you don't want to use.

That's exactly the problem.

> Even in this patch, it isn't clear that the low level timecounters
> are initialized before they are used.

The timecounter is always initialized first, then set.

> > Modified: head/sys/x86/isa/clock.c
> > =================================================================
> >============= --- head/sys/x86/isa/clock.c	Mon Mar 14 19:31:43
> > 2011	(r219645) +++ head/sys/x86/isa/clock.c	Mon Mar 14 22:05:59
> > 2011	(r219646) @@ -245,6 +245,42 @@ getit(void)
> > 	return ((high << 8) | low);
> > }
> >
> > +static __inline void
> > +delay_tsc(int n)
> > +{
> > +	uint64_t start, end, now;
> > +
> > +	sched_pin();
> > +	start = rdtsc();
> > +	end = start + (tsc_freq * n) / 1000000;
> > +	do {
> > +		cpu_spinwait();
> > +		now = rdtsc();
> > +	} while (now < end || (now > start && end < start));
> > +	sched_unpin();
> > +}
>
> You cannot call random scheduling code from DELAY(), since the
> scheduling code is not designed to be called in the "any" context. 
> As it happens, sched_pin() and sched_unpin() are safe, and were
> already used in the TSC case.

Actually, I really like to get rid of this code.  It isn't safe for 
many reasons, e.g., its frequency may change while in the loop and 
end result is unpredictable if TSC is not invariant.  We may limit 
its use for invariant case, though.

> > +
> > +static __inline void
> > +delay_timecounter(struct timecounter *tc, int n)
> > +{
> > +	uint64_t end, now;
> > +	u_int last, mask, u;
> > +
> > +	mask = tc->tc_counter_mask;
> > +	last = tc->tc_get_timecount(tc) & mask;
> > +	end = tc->tc_frequency * n / 1000000;
>
> This depends on the delicate timecounter locking to be safe.  I
> think it it is safe except possibly in tc_get_timecount(), for the
> same reasons that nanouptime() can run safely with no visible
> locking.  tc must be the current timecounter/timehands, which is
> guaranteed to not be in an in-between state or become so while we
> are using it, despite there being no visible locking.  Actually
> there are some minor races:
> - in nanouptime(), the timecounter generation is checked to avoid
> using a new generation if necessary, but updates of the generation
> count and the fileds that it protects are not properly atomic.
> - here, the check of the generation is missing.  So this works
> without races when called from ddb (since ddb stops other CPUs),
> but has minor races in general.

Window of the races is very narrow here and it only happens on i386 
where reading uint64_t is not atomic if my understanding is correct.  
It cannot be any worse than tsc_freq anyway. ;-)

> > +	now = 0;
> > +	do {
> > +		cpu_spinwait();
> > +		u = tc->tc_get_timecount(tc) & mask;
> > +		if (u < last)
> > +			now += mask - last + u + 1;
> > +		else
> > +			now += u - last;
> > +		last = u;
> > +	} while (now < end);
> > +}
> > +
> > /*
> >  * Wait "n" microseconds.
> >  * Relies on timer 1 counting down from (i8254_freq / hz)
> > @@ -253,6 +289,7 @@ getit(void)
> > void
> > DELAY(int n)
> > {
> > +	struct timecounter *tc;
> > 	int delta, prev_tick, tick, ticks_left;
> >
> > #ifdef DELAYDEBUG
> > @@ -262,16 +299,12 @@ DELAY(int n)
> > #endif
> >
> > 	if (tsc_freq != 0) {
> > -		uint64_t start, end, now;
> > -
> > -		sched_pin();
> > -		start = rdtsc();
> > -		end = start + (tsc_freq * n) / 1000000;
> > -		do {
> > -			cpu_spinwait();
> > -			now = rdtsc();
> > -		} while (now < end || (now > start && end < start));
> > -		sched_unpin();
> > +		delay_tsc(n);
> > +		return;
> > +	}
> > +	tc = timecounter;
> > +	if (tc->tc_quality > 0) {
> > +		delay_timecounter(tc, n);
> > 		return;
> > 	}
>
> Optimizing DELAY() was bogus.  Now the loop for this is
> duplicatied.

"Bogus" optimization is not exactly duplicated here if you meant "(now 
< end || (now > start && end < start))" is bogus.  Its bogusness 
comes from the fact that TSC is 64-bit.  However, timecounter is 
effectively 32-bit because of its mask.  Now overflow is carefully 
accounted for and it is (almost) correct.

> > #ifdef DELAYDEBUG
>
> Stuff keeps accumulating or mutating above the code for testing it.

Yeah, one more reason to get rid of delay_tsc(). ;-)

Jung-uk Kim