From owner-svn-src-head@FreeBSD.ORG  Mon Nov 17 11:22:53 2008
Return-Path: <owner-svn-src-head@FreeBSD.ORG>
Delivered-To: svn-src-head@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 44814106564A;
	Mon, 17 Nov 2008 11:22:53 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au
	[211.29.132.191])
	by mx1.freebsd.org (Postfix) with ESMTP id BF61A8FC14;
	Mon, 17 Nov 2008 11:22:52 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c122-107-115-33.carlnfd1.nsw.optusnet.com.au
	(c122-107-115-33.carlnfd1.nsw.optusnet.com.au [122.107.115.33])
	by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	mAHBMb91021883
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 17 Nov 2008 22:22:42 +1100
Date: Mon, 17 Nov 2008 22:22:36 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Maksim Yevmenkin <emax@FreeBSD.org>
In-Reply-To: <200811162239.mAGMd4mh066533@svn.freebsd.org>
Message-ID: <20081117222219.I95669@delplex.bde.org>
References: <200811162239.mAGMd4mh066533@svn.freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org,
	src-committers@FreeBSD.org
Subject: Re: svn commit: r185013 - head/sys/dev/syscons
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
	<svn-src-head.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
	<mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
	<mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Nov 2008 11:22:53 -0000

On Sun, 16 Nov 2008, Maksim Yevmenkin wrote:

> Author: emax
> Date: Sun Nov 16 22:39:04 2008
> New Revision: 185013
> URL: http://svn.freebsd.org/changeset/base/185013
>
> Log:
>  More locking for syscons(4). This should prevent races with sckbdevent().

This cannot be the correct fix.  It should cause panics due to failure
of the (missing) assertion that no locks may be accessed in debugger mode.
Locks may not be accessed in debugger mode because:
(1) locks may be in an inconsistent state (when locking code is being
      debugged) (unless they are made to work virtually -- switch all locks
      to a safe state while in debugger mode, but display the unswitched
      state in all debugger commands)
(2) deadlock is possible (would also be avoided by virtualization).

> Modified:
>  head/sys/dev/syscons/syscons.c
>
> Modified: head/sys/dev/syscons/syscons.c
> ==============================================================================
> --- head/sys/dev/syscons/syscons.c	Sun Nov 16 21:57:54 2008	(r185012)
> +++ head/sys/dev/syscons/syscons.c	Sun Nov 16 22:39:04 2008	(r185013)
> @@ -1572,6 +1572,7 @@ sccngetch(int flags)
>     int s = spltty();	/* block sckbdevent and scrn_timer while we poll */

The comment still says that this is what blocks sckbdevent().

The comment was wrong too for debugger mode -- in debugger mode,
interrupts are masked in hardware and other CPUs are stopped, so nothing
except possibly this functions internals can call sckbdevent().

Internal calls to scrn_timer() used to be prevented by sccndbctl()
setting syscons' `debugger' flag and this function and others checking
this flag.  This has been lost.  I think the flag isn't needed and
wasn't used to protect sckbdevent(), and your problem has nothing to
do with debugger mode except for breaking it.  Instead your problem
is a layering one (continued below (*)).

>     int c;
>
> +    mtx_lock(&Giant);
>     /* assert(sc_console != NULL) */
>
>     /*

Acquiring Giant in debugger mode is especially invalid (except deadlock
is not so likely for a recursive lock).

> @@ -1583,11 +1584,13 @@ sccngetch(int flags)
>     sccnupdate(scp);
>
>     if (fkeycp < fkey.len) {
> +	mtx_unlock(&Giant);
> 	splx(s);
> 	return fkey.str[fkeycp++];
>     }
>
>     if (scp->sc->kbd == NULL) {
> +	mtx_unlock(&Giant);
> 	splx(s);
> 	return -1;
>     }
> @@ -1610,6 +1613,7 @@ sccngetch(int flags)
>     scp->kbd_mode = cur_mode;
>     kbdd_ioctl(scp->sc->kbd, KDSKBMODE, (caddr_t)&scp->kbd_mode);
>     kbdd_disable(scp->sc->kbd);
> +    mtx_unlock(&Giant);
>     splx(s);
>
>     switch (KEYFLAGS(c)) {
>

BTW, sccngetch() shouldn't exist.  It existed to demultiplex the 2
console driver entry points sc_cngetch() and sc_cncheckc(), but
sc_cncheckc() has gone away.

(*) syscons has several getc routines.  The layering and locking for
these should be something like:

      sc_cngetc(): must not access any locks; currently implemented by
         calling scgetc(); therefore:
      scgetc(): must not access any locks
      sccngetch(): bogus layering -- see above
      sckbdevent(): I think this is the entry point for all normal (non-
         low-level-console) syscons input.  It is currently implemented
         by calling scgetc(), which must not access any locks.  Therefore,
         any locking must be in this function.  I think it currently uses
         Giant locking.

There is still a problem for calls to sc_cngetc() from the low-console
driver.  These can race with sckbdevent().  In debugger mode, no locking
is permitted, so syscons' getc functions must be carefully written to
do only harmless things when they lose races.  They are not carefully
written (e.g., almost the first thing they do, shown in the above, may
give a buffer overrun:

     if (fkeycp < fkey.len) {
  	mtx_unlock(&Giant);
  	splx(s);
  	return fkey.str[fkeycp++];
     }

since the increment is not atomic with the bounds check), but they
mostly work.  (Syscons' putc routines have a much larger number of
races like this, but they mostly work too.  It used to be easy to cause
panics by racing normal console output with printfs from an interrupt
handler (use a high frequency timeout interrupt handler that prints
something), but almost any locking would fix that and I think Giant-locking
everything fixed it accidentally (**).  So there is only a problem
modes like debugger mode where locking is not permitted.)

The races for sc_cngetc() were normally limited:
- most calls were in debugger mode, and debugger mode limits problems
- the only other calls were for things like gets() for non-auto
    mountroot and cngetc() for the "hit any key to reboot".  This case
    might even supply the needed and permitted Giant locking accidentally.

However, you seem to have made the races very common by (ab)using cngetc()
for keyboard multiplexing.  cngetc() is not designed for this.  You need
to know syscons' internals and do the necessary Giant locking in the caller.

(**) Giant locking for printf: I can't find any.  tprintf() and uprintf()
used to acquire Giant to fix related problems, and even that went away
with the MPSAFE tty changes.  Ah, the related problem is very closely
related: ttys used to require Giant locking for everything *except* low-
level consoles.  tprintf() and uprintf() use normal i/o so they had to
supply the Giant locking.  OTOH, low-level console i/o uses (broken for
some console drivers including syscons) null locking.  The keyboard
multiplexor needs to work mainly for normal i/o (it would be a nightmare
to make it work for low-level i/o, with correct null locking, since it
touches a lot of code and data).  Thus it needs to supply the Giant
locking if syscons still needs it.  It still seems to use Giant locking
after the MPSAFE tty changes.  I don't completely understand those.
They make it even harder supply correct locking -- now you might need
Giant or the tty lock, or both...  In fact, the main change to
sckbdevent() to make syscons MPSAFE was to use Giant locking in
sckbdevent().  That seems sort of backwards (the low levels which benefit
most from low latency keep high latency), but might be needed because
keyboard drivers aren't MPSAFE yet, and wouldn't want to know about
the tty lock even if they were.  So supplying Giant locking seems to
be the correct way to interface with sckbdevent().

Bruce