From owner-freebsd-arch@FreeBSD.ORG  Wed Aug 26 22:07:49 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AEC651065690
	for <arch@FreeBSD.org>; Wed, 26 Aug 2009 22:07:49 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au
	[211.29.132.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 30D408FC27
	for <arch@FreeBSD.org>; Wed, 26 Aug 2009 22:07:48 +0000 (UTC)
Received: from besplex.bde.org (c122-106-152-1.carlnfd1.nsw.optusnet.com.au
	[122.106.152.1])
	by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	n7QM7ZlC016388
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 27 Aug 2009 08:07:36 +1000
Date: Thu, 27 Aug 2009 08:07:35 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Marcel Moolenaar <xcllnt@mac.com>
In-Reply-To: <F43EE525-7BAA-4D33-B4C2-4452E863CE66@mac.com>
Message-ID: <20090827044818.D826@besplex.bde.org>
References: <20090824174050.GI2829@hoeg.nl>
	<2678DC6C-3E91-420A-B43D-02E0F1F853C5@mac.com>
	<20090825073057.GK2829@hoeg.nl>
	<C6553051-E797-47EA-9044-7ED91F469F51@mac.com>
	<20090826161941.B41435@delplex.bde.org>
	<6B5F99F6-3A66-4AE3-89BE-973F40AE34A9@mac.com>
	<20090826192646.O10848@besplex.bde.org>
	<F43EE525-7BAA-4D33-B4C2-4452E863CE66@mac.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Ed Schouten <ed@80386.nl>, FreeBSD Arch <arch@FreeBSD.org>
Subject: Re: mtx_lock_do_what_i_mean()
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Aug 2009 22:07:49 -0000

On Wed, 26 Aug 2009, Marcel Moolenaar wrote:

> On Aug 26, 2009, at 5:06 AM, Bruce Evans wrote:
>
>>>> Everything is in place to remove 0.1% of the coupling.  Debugger i/o
>>>> still normally goes to the same device as user and kernel i/o, so it
>>>> is strongly coupled.
>>> 
>>> That's a non sequitur. Sharing travel destinations does
>>> not mean that you travel together, for example.
>> 
>> The coupling here occurs at the destination.
>
> Exactly: that's why I said everything is in place to change the
> destination of printf().

Except for printf()s in panic().  These can only be redirected to
driver(s) similar to working console drivers would need similar
complications to work.  One thing that could be done now is
reducing from multiple active broken console drivers to a single
least-broken one for panics and debugging, or at least as a last
effort for recursive panics.  However, this doesn't help in the
usual case where there is only 1 active console driver or in the
fairly usual case where the user has only 1 set of console hardware.

>>> Having printf() not even write to the console does not
>>> mean that the debugger cannot keep using the low-level
>>> console interfaces...
>> 
>> It just means that printf() would be slightly broken (no longer
>> synchronous, and unusable in panic()...).
>
> printf not being synchronous is actually what solves all
> the complexity. We don't need synchronous output in the
> normal case. Only for the "broken" case (i.e. kernel
> panic, no root FS), do we need synchronous output. It's
> the exceptional case.
>
> I belief common sense tells us to optimize for the common
> case.

We're not optimizing, but making the uncommon case actually work.

>> Note that strong coupling is simplest here.
>
> I disagree. We've had various threads on this topic and
> they had the same theme: "we have this interlock and it's
> a problem. Help!"

There is remarkably little understanding of the problem.  The
interlock is not a problem.  It is the state being protected by
the interlock that is a problem!  A working console driver cannot
just blow away the interrupted state, either hardware or software,
and it cannot rely on the interrupt state being usable.

> I belief that trying to solve the problem within the
> existing framework is not the solution, because I belief
> the framework itself is the problem. Rethinking it from
> the bottom-up helps to detangle and come up with a good
> implementation.

I've outlined a good solution many times and implemented part of it in
FreeBSD and all of it in my debugger's i/o routines:
- switch to and from console i/o mode when directed to by cn_dbctl().
   Natural switch points for ddb are on entry to and exit from interactive
   mode.  Natural switch points for other i/o are not so clear -- switching
   for every printf() is too much.
- make the switches nest as deep as necessary (3 or 4 levels is enough)
- switches involve:
   on entry:
   - save all state (hardware and software) not already saved.  Here it
     helps for upper layers to not have much state in the air.  Here it
     is essential for upper layers to have recorded their state in some
     atomic way.
   - initialize all state (hardware and software) used by console driver.
     Here it helps for the upper layers to have initialized most of the
     state.  You can always reinitialize everything, but then restoring
     the interrupted state will be hard.
   on exit:
   - restore as much state (hardware and software) as possible.  Except,
     if the console i/o routines wrote to the same physical console that
     non-console routines write to, and this physical console is something
     like a frame buffer, then the something-like-a-frame-buffer state
     must not be restored, so that you can actually read the output.

   Debugger frame buffer output should go to a separate virtual console,
   possibly in a different video mode.  If the video mode is not switched,
   then the hardware part of the switch involves mainly copying the frame
   buffer out and in.  If the video mode is switched, then the switch
   involves reinitializing lots of hardware.  Switching to a fixed mode
   is probably still simpler than supporting console output in all possible
   interrupted modes.

   Writing to a separate virtual console is not so good for non-debugger
   output, since the output needs to be made visible at some point and
   switching the console on every printf() is no good.  Switching back
   and forth on every printf() is worst -- it makes the screen flicker,
   and the output is still invisible most of the time.  IIRC, old console
   drivers (codrv and pcvt) wrote to ttyv0 and forced a switch there.
   This was inconvenient.  It is probably best to always printf() to a
   separate virtual console (different from the debugger one), and also
   to the current console if possible and implemented.

- after switching to console mode, the actual i/o is trival, at least if
   you always switch the interrupted mode to a fixed one if the interrupted
   mode is different.

>>  If debugger i/o is in a
>> separate module then it has a hard time even knowing the interrupted
>> state.  One impossibly difficult weakly-coupled case is when normal
>> i/o is done by a propietary X driver using undocumented hardware
>> features from userland, with some undocumented features active at the
>> time of the interrupt.
>
> The question is: why try so hard to solve a problem that's
> specific to a case we all try our best to avoid? Isn't it
> much easier to say that debugger output and console are not
> the same so that you can run X on syscons and DDB over a
> serial interface and if all else fails: dump a kernel core
> and analyze the state offline.

Some users and/or developers don't have or want to set up multiple
i/o devices just for consoles.

> Having an in-kernel debugger is great, but it should be
> kept at "arms length" as much as possible. The moment you
> start sharing interfaces or mixing functionality you're
> setting yourself up for failure: either the debugger does
> not work in certain cases (running X is a perfect example
> of how the in-kernel debugger is totally useless) or you
> complicate the kernel unnecessarily.

Well, once you have working console i/o for debuggers, you almost
have it for panic(), and you don't have to worry about printf()
from interrupt handlers that interrupt an inadequately locked
upper-level i/o routine corrupting the upper level.  Syscons'
spltty() (and missing) locking was always inadequate for this
(it needed to be splhigh() for much the same reasons that it is
now a spin mutex, but this is ugly).  Working console i/o requires
forces you to find most of the inadequate locking and helps find
some of it (single-step through the driver until it crashes).

>>  Non-debugger console i/o is also impossibly
>> difficult in this case.  FreeBSD doesn't attempt to support it, and
>> uses cn*avail* interfaces to give null i/o and less than null ddb
>> support.  With all the i/o in the kernel, it is possible to guess the
>> hardware and driver state by peeking at driver variables and hardware
>> registers.  With strong coupling, it is possible to do this robustly.
>
> That's not true. There's no robust way for the kernel debugger
> to use hardware that is under the console of process space.
> If anything: output is always interrupted and disrupted by the
> debugger, so even if the hardware is left in a consistent state,
> the actual content on the screen may be garbled.

The amount of disruption is hardware-dependent.  Losing a couple of
characters is unimportant, and is also avoidable for frame buffer
type hardware.  Switching the console and its mode to avoid disrupting
output has been routine in at least userland debuggers for at least
20 years.  Userland debuggers have fewer problems doing this since
they can't interrupt hardware mode switches, but their switch is
still a heavyweight operation.  OTOH, kernel debuggers can avoid the
mode switch and possibly the console switch in most cases since they
are close to the driver (if the driver is entirely in the kernel).

>> Upper layers must cooperate by recording enough of their state in an
>> atomic way.  The coupling in lower layers then consists of using the
>> records and knowing that they are sufficient.
>
> Upper layers include user space in some cases. The state of the
> 3D graphics accelerator is not something you want to have to worry
> about in the kernel. Though, you do want to know the "mode" if
> you want to write to the frame buffer. Graphical displays is our
> weakest point and given that there's no interest in fixing it,

Yes, I don't want to support any userland graphics driver.

> I can say that no matter what we do in the existing framework we
> will never have robust behaviour.

Actually, it can be made robust (but not working) fairly easily:
- clear the device's cn_avail_mask bit while in critical regions.
   This could be implemented fairly easily using the macro for locking
   the critical regions.
- fix the placement of cnunavailable() in ddb, so that breakpoints
   in the instruction stream aren't fatal when they are hit while
   the console is unavailable.
- the cn_avail_mask bit is managed by calling cnavail().  It is not
   very efficient, despite having no locking whatsoever.  Fix its
   locking and try to optimize it.  The whole of kern_cons.c has no
   locking except for broken locking in cnputs().  Normal mutex locking
   cannot be used, but cnputs() uses it.  mtx_trylock() would be
   robust (but not working) in cnputs().  Accesses to cn_avail_mask
   need a bit more than atomicity, since there is another avail bit
   in cn_flags.
- fix many other races involving accesses to cn_avail_mask.  Oops.
   there are some that cannot be made robust fairly easily.  There is
   1 involving checking cnunavail() and acting on its result.  This one
   is mosty dormant.  Active ones include cnremove() called from the
   console control sysctl racing with cnavail() called from upper layers
   of device drivers that support consoles.

Bruce