From owner-svn-src-head@FreeBSD.ORG  Mon Jun 15 22:16:08 2009
Return-Path: <owner-svn-src-head@FreeBSD.ORG>
Delivered-To: svn-src-head@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C4EC2106566B;
	Mon, 15 Jun 2009 22:16:08 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au
	[211.29.132.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 598B98FC17;
	Mon, 15 Jun 2009 22:16:07 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c122-106-159-184.carlnfd1.nsw.optusnet.com.au
	(c122-106-159-184.carlnfd1.nsw.optusnet.com.au [122.106.159.184])
	by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	n5FMG3XV022515
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 16 Jun 2009 08:16:05 +1000
Date: Tue, 16 Jun 2009 08:16:03 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20090615153040.R1080@besplex.bde.org>
Message-ID: <20090616070732.Q25544@delplex.bde.org>
References: <200906141801.n5EI1Zti056239@svn.freebsd.org>
	<4A356A0F.3050800@freebsd.org>
	<20090615075134.K24645@delplex.bde.org> <4A359AA6.7010101@freebsd.org>
	<20090615114142.B775@besplex.bde.org>
	<20090615153040.R1080@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: src-committers@FreeBSD.org, jhb@FreeBSD.org, svn-src-all@FreeBSD.org,
	Ed Schouten <ed@FreeBSD.org>, Sam Leffler <sam@FreeBSD.org>,
	svn-src-head@FreeBSD.org
Subject: Re: svn commit: r194204 - in head/sys: amd64/conf i386/conf
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
	<svn-src-head.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
	<mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
	<mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Jun 2009 22:16:09 -0000

On Mon, 15 Jun 2009, Bruce Evans wrote:

> ...
> This version for RELENG_7 seems to work as intended.  jhb should look at
> its atomic ops.

No reply yet.

I thought of some problems with this version.  Mainly with the delaying code:

- Some versions of DELAY() need to lock the hardware, so calling DELAY()
   can deadlock.  E.g., on amd64 before the TSC is initialized, and on
   i386 with no TSC and/or before the TSC is initialized, and when kdb
   is not active on both, DELAY() calls getit(), and getit() locks the
   clock hardware unconditionally using a non-recursive spin mutex.
   Contrary to what I said in previous mail, detection of erroneous
   recursion isn't broken in the usual case.  The usual case is probably
   INVARIANTS, and then recursion is detected.  The handling of erroneous
   version then causes endless recursion on printf(): it is a failing
   KASSERT() which will call panic(), which will call printf(), which
   will reach the failing KASSERT() again.  The non-recursive spinlock
   in cnputs() has the same bug (deadlock --> recursive deadlock).  This
   problem in DELAY() is well known, so it is worked around when kdb
   is active by not calling getit() then.

   Nearby bugs in DELAY(): if DELAY() is first called after the TSC is
   initialized, then its debugging code is never reached.  Its debugging
   code is a non-NOTEd non-option and could have been removed after the
   getit() version of DELAY() was verified to give reasonably accurate
   timing, but it is more useful for the TSC version since the TSC version
   has not been verified to give reasonably accurate timing.  The TSC version
   must fail any reasonable verifiications except probably for P-state
   invariant TSCs since the TSC varies and DELAY() makes no attempt to
   compensate for its variance).  If DELAY() is first called before the
   TSC is initialized, then the debugging code still works for the i8254
   but its placement is confusing, and when the counter is changed to the
   TSC there is no code to debug the change.

- timecounters are no better than DELAY() for implementing the delaying,
   since apart from them possibly not working on cold and/or deadlocked
   systems, although the upper-level timecounter code is lock-free, the
   timecounter hardware code might need to use a lock.  Again, the i8254
   timecounter hardware code uses the clock spinlock.

- KTR uses plain printf(), and KTR can produce a lot of output, so the
   delay should be as short as possible, as for mcount_trylock(), and
   1ms is too long.  Recursion is a relatively unimportant part of the
   problem here.  Too-long delays are possible in normal operation,
   when one CPU is in a normal printf() and other CPUs want to do KTR
   printfs.  Serialization of the printf()s is especially important for
   voluminous concurrent KTR output, but so is printing such output fast.

   jhb should look at this too.  I use KTR less than once a year.

> % Index: subr_prf.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/kern/subr_prf.c,v
> % retrieving revision 1.130.2.1
> % diff -u -2 -r1.130.2.1 subr_prf.c
> % --- subr_prf.c	21 Jan 2009 00:26:45 -0000	1.130.2.1
> % +++ subr_prf.c	15 Jun 2009 05:32:03 -0000
> % @@ -112,4 +112,27 @@
> %      &always_console_output, 0, "Always output to console despite 
> TIOCCONS.");
> % % +static int	printf_lockcount;
> % +
> % +static void
> % +printf_lock(void)
> % +{
> % +	int timeout;
> % +
> % +	timeout = 1000;
> % +	do {
> % +		if (atomic_cmpset_acq_int(&printf_lockcount, 0, 1))
> % +			return;
> % +		DELAY(1000);
> % +	} while (--timeout != 0);
> % +	atomic_add_acq_int(&printf_lockcount, 1);
> % +}

If the DELAY() is removed, the initial value of `timeout' would need to
be (possibly dynamically) calibrated.

The timeouts for the panics for spinlocks and threadlocks in kern_mutex.c
have similar problems and could benefit from calibration.  First they
do up to 10 million cpu_spinwait()s.  10 million might be too small
or too large.  Then they do up to 60 million DELAY(1)s.  DELAY() can
deadlock as above.  60 million is supposed to give a delay of 60
seconds, but short delays can be very inaccurate (the minimum DELAY()
is about 5 us with i8254 hardware on fast CPUs and about 30 us with
i8254 hardware on 1990's CPUs), so the timeouts can be more like 5
minutes than 1 minute.

A non-dynamically calibrated loop using uncached memory or device
accesses has a better chance of working accurately than the non-dynamically
calibrated TSC loop in the amd64 and i8254 DELAY()s, since throttling
of the TSC is more common than throttling of memory systems.

Bruce