From owner-svn-src-all@FreeBSD.ORG  Wed Jun  1 10:07:38 2011
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4867A106566B;
	Wed,  1 Jun 2011 10:07:38 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au
	[211.29.132.187])
	by mx1.freebsd.org (Postfix) with ESMTP id 9A8228FC14;
	Wed,  1 Jun 2011 10:07:37 +0000 (UTC)
Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au
	(c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191])
	by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p51A7VN3009677
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 1 Jun 2011 20:07:33 +1000
Date: Wed, 1 Jun 2011 20:07:31 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Andriy Gapon <avg@FreeBSD.org>
In-Reply-To: <4DE5D72A.1020405@FreeBSD.org>
Message-ID: <20110601193030.A1275@besplex.bde.org>
References: <201105311729.p4VHTwrZ033296@svn.freebsd.org>
	<4DE5D72A.1020405@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org,
	src-committers@FreeBSD.org, "Kenneth D. Merry" <ken@FreeBSD.org>
Subject: Re: svn commit: r222537 - in head/sys: kern sys
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Jun 2011 10:07:38 -0000

On Wed, 1 Jun 2011, Andriy Gapon wrote:

> on 31/05/2011 20:29 Kenneth D. Merry said the following:
>> +	mtx_init(&mbp->msg_lock, "msgbuf", NULL, MTX_SPIN);
>
> Sorry that I didn't gather myself together for a review before this change got
> actually committed.
> Do you see any reason not to make this spinlock recursive?

I see reasons why it must not exist.  The message buffer code cannot use
any normal locking, and was carefully written to avoid doing so.

> I am a little bit worried about "exotic" situations like receiving an NMI in the
> middle of printing and wanting to print in the NMI context, or similar things
> that penetrate contexts with disabled interrupts - e.g. Machine Check Exception.
> Also it's not clear to me if there won't any bigger damage in the situations
> like those described above.
>
> P.S. I have been thinking about fixing the problem in a different fashion, via
> reserving portions of dmesg buffer for a whole message using CAS:
> http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html

Something like this might work.

PRINTF_BUFR size should not exist either, especially now that it is
much more complicated and broken.  Here is some of my old mail about
adding (necessarily not normal) locking to to printf.  No one replied,
so I never finished this :-(.

%%%
>From bde@optusnet.com.au Mon Jun 15 16:18:03 2009 +1000
Date: Mon, 15 Jun 2009 16:17:59 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Sam Leffler <sam@FreeBSD.org>, Ed Schouten <ed@FreeBSD.org>,
     src-committers@FreeBSD.org, svn-src-all@FreeBSD.org,
     svn-src-head@FreeBSD.org
Subject: Re: svn commit: r194204 - in head/sys: amd64/conf i386/conf
In-Reply-To: <20090615114142.B775@besplex.bde.org>
Message-ID: <20090615153040.R1080@besplex.bde.org>
References: <200906141801.n5EI1Zti056239@svn.freebsd.org> <4A356A0F.3050800@freebsd.org>
  <20090615075134.K24645@delplex.bde.org> <4A359AA6.7010101@freebsd.org>
  <20090615114142.B775@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Status: RO
X-Status: A
X-Keywords: 
X-UID: 913

On Mon, 15 Jun 2009, Bruce Evans wrote:

> On Sun, 14 Jun 2009, Sam Leffler wrote:

>> I'd love to see someone properly lock printf/console output instead of 
>> kludges like PRINTF_BUFR_SIZE.
>
> Almost done (for FreeBSD-~5.2): non-production version with lots of debugging
> cruft:

This version for RELENG_7 seems to work as intended.  jhb should look at
its atomic ops.

I only have 1 normal source of interleaved messages -- the messages about
waiting for system processes on shutdown on SMP systems -- and this seems
to be fixed.

The patch applies even without removing PRINTF_BUFR_SIZE or cnputs(), and
works, at least if PRINTF_BUFR_SIZE and the per-cpu buffer are not enabled.

Note that it serializes accesses to the message buffer too.  The locks
automatically cover the message buffer if they are applied, and are
applied to all cases that set TOLOG, though this might be too dangerous
for tprintf() since printing to a user's console shouldn't block other
parts of the kernel.  printf() to a redirected console is more careful
about this -- it extracts messages from the message buffer and can handle
long delays provided the message buffer doesn't wrap around.  This
might be the best method for all cases.  Accesses to the message buffer
need to be serialized for this to work properly, and serializing ony
accesses to the message buffer is easier since there are no hardware
delays to worry about -- strict locking with no timeouts will work
provided the halting problem is solvable for kvprintf() to the message
buffer alone.

No attempt is made to buffer messages across printf()s.  Neither does
PRINTF_BUFR_SIZE.  PRINTF_BUFR_SIZE uses an auto buffer so it obviously
has to flush at the end of each printf.  Always extracting from the
message buffer might be the best way to handle this too.  Normally you
would extract complete lines, but there must be a timeout to handle
partial lines.

The 1 second timeout here isn't best if the timeout is needed for
avoiding deadlock, as may happen when a printf() is interrupted by
panic().  panic could reasonably() blow open all locks including
printf_lock(), but this is not easy to do without letting other CPUs
proceed and/or complicating the usual case.

Very slow terminals, say 50 bps, need a timeout of say 16 seconds so
that the can print an 80-character line without being interrupted.

% Index: subr_prf.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/kern/subr_prf.c,v
% retrieving revision 1.130.2.1
% diff -u -2 -r1.130.2.1 subr_prf.c
% --- subr_prf.c	21 Jan 2009 00:26:45 -0000	1.130.2.1
% +++ subr_prf.c	15 Jun 2009 05:32:03 -0000
% @@ -112,4 +112,27 @@
%      &always_console_output, 0, "Always output to console despite TIOCCONS.");
% 
% +static int	printf_lockcount;
% +
% +static void
% +printf_lock(void)
% +{
% +	int timeout;
% +
% +	timeout = 1000;
% +	do {
% +		if (atomic_cmpset_acq_int(&printf_lockcount, 0, 1))
% +			return;
% +		DELAY(1000);
% +	} while (--timeout != 0);
% +	atomic_add_acq_int(&printf_lockcount, 1);
% +}
% +
% +static void
% +printf_unlock(void)
% +{
% +
% +	atomic_add_rel_int(&printf_lockcount, -1);
% +}
% +
%  /*
%   * Warn that a system table is full.
% @@ -198,5 +221,9 @@
%  	pca.flags = flags;
%  	va_start(ap, fmt);
% +	if (pri != -1)
% +		printf_lock();	/* To serialize msgbuf.  XXX: long-held lock? */
%  	kvprintf(fmt, putchar, &pca, 10, ap);
% +	if (pri != -1)
% +		printf_unlock();
%  	va_end(ap);
%  	if (sess != NULL)
% @@ -243,5 +270,7 @@
% 
%  	va_start(ap, fmt);
% +	printf_lock();		/* Even for TOLOG, for serializing msgbuf. */
%  	kvprintf(fmt, putchar, &pca, 10, ap);
% +	printf_unlock();
%  	va_end(ap);
% 
% @@ -312,5 +341,7 @@
%  #endif
% 
% +	printf_lock();
%  	retval = kvprintf(fmt, putchar, &pca, 10, ap);
% +	printf_unlock();
%  	va_end(ap);
% 
% @@ -350,5 +381,7 @@
%  #endif
% 
% +	printf_lock();
%  	retval = kvprintf(fmt, putchar, &pca, 10, ap);
% +	printf_unlock();
% 
%  #ifdef PRINTF_BUFR_SIZE

Bruce

>From bde@optusnet.com.au Tue Jun 16 08:16:06 2009 +1000
Date: Tue, 16 Jun 2009 08:16:03 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Sam Leffler <sam@FreeBSD.org>, Ed Schouten <ed@FreeBSD.org>,
     jhb@freebsd.org, src-committers@FreeBSD.org, svn-src-all@FreeBSD.org,
     svn-src-head@FreeBSD.org
Subject: Re: svn commit: r194204 - in head/sys: amd64/conf i386/conf
In-Reply-To: <20090615153040.R1080@besplex.bde.org>
Message-ID: <20090616070732.Q25544@delplex.bde.org>
References: <200906141801.n5EI1Zti056239@svn.freebsd.org> <4A356A0F.3050800@freebsd.org>
  <20090615075134.K24645@delplex.bde.org> <4A359AA6.7010101@freebsd.org>
  <20090615114142.B775@besplex.bde.org> <20090615153040.R1080@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Status: O
X-Status: 
X-Keywords: 
X-UID: 914

On Mon, 15 Jun 2009, Bruce Evans wrote:

> ...
> This version for RELENG_7 seems to work as intended.  jhb should look at
> its atomic ops.

No reply yet.

I thought of some problems with this version.  Mainly with the delaying code:

- Some versions of DELAY() need to lock the hardware, so calling DELAY()
    can deadlock.  E.g., on amd64 before the TSC is initialized, and on
    i386 with no TSC and/or before the TSC is initialized, and when kdb
    is not active on both, DELAY() calls getit(), and getit() locks the
    clock hardware unconditionally using a non-recursive spin mutex.
    Contrary to what I said in previous mail, detection of erroneous
    recursion isn't broken in the usual case.  The usual case is probably
    INVARIANTS, and then recursion is detected.  The handling of erroneous
    version then causes endless recursion on printf(): it is a failing
    KASSERT() which will call panic(), which will call printf(), which
    will reach the failing KASSERT() again.  The non-recursive spinlock
    in cnputs() has the same bug (deadlock --> recursive deadlock).  This
    problem in DELAY() is well known, so it is worked around when kdb
    is active by not calling getit() then.

    Nearby bugs in DELAY(): if DELAY() is first called after the TSC is
    initialized, then its debugging code is never reached.  Its debugging
    code is a non-NOTEd non-option and could have been removed after the
    getit() version of DELAY() was verified to give reasonably accurate
    timing, but it is more useful for the TSC version since the TSC version
    has not been verified to give reasonably accurate timing.  The TSC version
    must fail any reasonable verifiications except probably for P-state
    invariant TSCs since the TSC varies and DELAY() makes no attempt to
    compensate for its variance).  If DELAY() is first called before the
    TSC is initialized, then the debugging code still works for the i8254
    but its placement is confusing, and when the counter is changed to the
    TSC there is no code to debug the change.

- timecounters are no better than DELAY() for implementing the delaying,
    since apart from them possibly not working on cold and/or deadlocked
    systems, although the upper-level timecounter code is lock-free, the
    timecounter hardware code might need to use a lock.  Again, the i8254
    timecounter hardware code uses the clock spinlock.

- KTR uses plain printf(), and KTR can produce a lot of output, so the
    delay should be as short as possible, as for mcount_trylock(), and
    1ms is too long.  Recursion is a relatively unimportant part of the
    problem here.  Too-long delays are possible in normal operation,
    when one CPU is in a normal printf() and other CPUs want to do KTR
    printfs.  Serialization of the printf()s is especially important for
    voluminous concurrent KTR output, but so is printing such output fast.

    jhb should look at this too.  I use KTR less than once a year.

> % Index: subr_prf.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/kern/subr_prf.c,v
> % retrieving revision 1.130.2.1
> % diff -u -2 -r1.130.2.1 subr_prf.c
> % --- subr_prf.c	21 Jan 2009 00:26:45 -0000	1.130.2.1
> % +++ subr_prf.c	15 Jun 2009 05:32:03 -0000
> % @@ -112,4 +112,27 @@
> %      &always_console_output, 0, "Always output to console despite 
> TIOCCONS.");
> % % +static int	printf_lockcount;
> % +
> % +static void
> % +printf_lock(void)
> % +{
> % +	int timeout;
> % +
> % +	timeout = 1000;
> % +	do {
> % +		if (atomic_cmpset_acq_int(&printf_lockcount, 0, 1))
> % +			return;
> % +		DELAY(1000);
> % +	} while (--timeout != 0);
> % +	atomic_add_acq_int(&printf_lockcount, 1);
> % +}

If the DELAY() is removed, the initial value of `timeout' would need to
be (possibly dynamically) calibrated.

The timeouts for the panics for spinlocks and threadlocks in kern_mutex.c
have similar problems and could benefit from calibration.  First they
do up to 10 million cpu_spinwait()s.  10 million might be too small
or too large.  Then they do up to 60 million DELAY(1)s.  DELAY() can
deadlock as above.  60 million is supposed to give a delay of 60
seconds, but short delays can be very inaccurate (the minimum DELAY()
is about 5 us with i8254 hardware on fast CPUs and about 30 us with
i8254 hardware on 1990's CPUs), so the timeouts can be more like 5
minutes than 1 minute.

A non-dynamically calibrated loop using uncached memory or device
accesses has a better chance of working accurately than the non-dynamically
calibrated TSC loop in the amd64 and i8254 DELAY()s, since throttling
of the TSC is more common than throttling of memory systems.

Bruce
%%%

-current still has the broken cnputs():

% void
% cnputs(char *p)

This function was only used by PRINTF_BUR_SIZE.  I is under the ifdef
for one call, but the larger code in putcons() that calls it is not
ifdefed although it was only used if ap->p_bufr != NULL and only
PRINTF_BUFR_SIZE sets p_bufr.

% {
% 	int c;
% 	int unlock_reqd = 0;
% 
% 	if (use_cnputs_mtx) {
% 		mtx_lock_spin(&cnputs_mtx);

Ordinary locking cannot be used here.  cnputs_mtx is not even recursive,
so any recursive call here gives deadlock.  Recursive calls can easily
be arranged using ddb, perhaps just by tracing through this code, or
possibly by NMIs.

A recursive mutex would cause other problems (not so bad) when recursion
occurse.

The new msg_lock seems to have the same problems and more.  It isn't
recursive.  cnputs_mtx is at least NOWITNESS.  If msg_lock were recursive,
then problems from this would probably be quite bad, unlike for cnputs_mtx
-- with cnputs_mtx, you should just get interleaved output, while with
msg_lock you would probably get a corrupt message buffer.

% 		unlock_reqd = 1;
% 	}
% 
% 	while ((c = *p++) != '\0')
% 		cnputc(c);
% 
% 	if (unlock_reqd)
% 		mtx_unlock_spin(&cnputs_mtx);
% }

Bruce