From owner-freebsd-smp  Fri Jun 23 10:43:37 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 6F9F937C3CE
	for <freebsd-smp@FreeBSD.ORG>; Fri, 23 Jun 2000 10:43:07 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id KAA11163;
	Fri, 23 Jun 2000 10:43:01 -0700 (PDT)
	(envelope-from dillon)
Date: Fri, 23 Jun 2000 10:43:01 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200006231743.KAA11163@apollo.backplane.com>
To: Bruce Evans <bde@zeta.org.au>
Cc: Jason Evans <jasone@canonware.com>, Greg Lehey <grog@lemis.com>,
	Warner Losh <imp@village.org>, The Hermit Hacker <scrappy@hub.org>,
	freebsd-smp@FreeBSD.ORG
Subject: Re: SP Patchset #1 up
References:  <Pine.BSF.4.21.0006240215530.295-100000@besplex.bde.org>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

:>     SchedMutex is already held then getting it again isn't going to hurt.
:>     If it isn't held then getting it the first time isn't going to hurt.
:
:In that case, it doesn't protect against unsupported recursion in the syscons
:console i/o routines any better than spltty().

    It protects against:

	* an interrupt occuring during the critical section on the same cpu
	* entry into those routines from another cpu

    Which is as good as you can get, really, since the only other option is
    to lockup.  

    If you get a trap in the middle of the debugging section, with SchedMutex
    held, I'd wager you would rather see another debug prompt then see a 
    complete lockup.

						-Matt

:>     SchedMutex is never invalid.  No mutex is ever 'invalid'... they are
:>     locked in an atomic cmpexg.  Either SchedMutex is held when the break
:>     point occurs, or it isn't.
:
:>From your patches for exception.s:
:
:| +	/*
:| +	 * We have to protect the SchedMutex and curproc fixup from 
:| +	 * interrupts on this cpu (which check for SchedMutex being
:| +	 * held on the 'current' cpu).
:| +	 */
:| +	cli
:| +

    You mean i386/i386/swtch.s

    The issue here is that if an interrupt occurs on the cpu holding
    SchedMutex, the interrupt must be defered.

    When the switch code switches to a new process it must change curproc,
    which breaks the detection in the interrupt code that determines that
    the current cpu is holding SchedMutex.  Thus interrupts (on the current
    cpu only) must be disabled to allow the switch code to update curproc
    AND SchedMutex.  

    The SchedMutex is held throughout this period, but there is a small window
    where it is held by the 'wrong' process (curproc has been changed,
    but SchedMutex's mtx_lock has not), which I am currently using cli
    to protect.

    If a debugger trap were to occur on the current cpu at just point, it
    would indeed lead to a lockup.  You are absolutely correct.  So maybe
    the comment should read: don't create any break points in between the
    setting of curproc and the fixup of SchedMutex!

    We can solve this without the CLI/STI by releasing the SchedMutex 
    entirely, changing curproc, and then regaining SchedMutex.  I personally
    do not like that idea but it would be more 'correct', and it would not
    have the lockup problem if a DDB trap or NMI occured on the current cpu
    at just that point.
    

:"cli" and other forms of disabling interrupts provide no protection against
:exceptions which aren't interrupts, in particular against debugger traps.

    Very true.  On the otherhand, I don't think it's possible for anything
    but a set break point to trap the code at that point so I don't consider
    it a big problem.  Oh, ok.. NMI could do it.

:>     I don't see the issue here.  interrupts cannot nest SchedMutex - look
:>     at the interrupt code.  At the moment interrupt can nest other interrupts
:>     only insofar as the new interrupt occuring before the old interrupt has
:>     obtained the giant mutex.
:
:Traps are not interrupts.  Debugger traps are the main problem.  NMIs
:would be a bigger problem if they occurred in normal operation (note
:that our NMI handling is broken; it does dangerous things like calling
:printf).

    We could support NMIs by releasing the SchedMutex in that critical
    swtch.s section and then regaining it after adjusting curproc.   
    The NMI would be able to support the kernel printf() in that case.

    It is possible to release SchedMutex simply with a non-locked
    andl $MTF_CONTESTED,_SchedMutex+MTX_LOCK (because MTF_CONTESTED cannot
    be set by the 'other' cpu for SchedMutex), but regaining it would
    require a call to mtx_enter_sched_quick.  On the otherhand, we could
    get rid of the cli/sti pair and that might make up enough time to
    be close to break-even on the performance.

:> :If switching from a high priority interrupt task to a low priority one is
:> :allowed, then the first few levels don't need to be supported.
:> 
:>     I don't follow.  The SchedMutex recursion count is saved and restored
:>     when a switch occurs.
:
:This will only matter when (if) prioritized interrupts are supported again.

    Prioritized interrupts will follow naturally when Greg 
    puts in the heavy-weight interrupt threads.  The interrupt threads will
    be prioritized just like normal processes, and the highest priority
    thread is the one that is going to get to run first (even interrupting
    and being switched into from a lower priority interrupt thread).

:> 
:>     If you place a debugger trap at a point where SchedMutex is held,
:>     guess which cpu the debugger takes the trap on?  The one holding
:>     SchedMutex.  Thus no problem... the debug trap will simply bump the
:>     recursion count of SchedMutex temporarily.
:
:Are you saying that this code is never reached for debugger traps?
:
:I'm not even thinking about complications for SMP.  The problems I'm
:talking about affect for UP.  The SMP case shouldn't be significantly
:different for debugger traps, since the first thing the debugger trap
:handler should do is stop the other CPUs.

    I think if we fix the one case you pointed out in swtch.s, that we
    could take a debugger trap anywhere except in the debugger trap
    dispatch code itself.  SP or MP.

:>     The only thing we have to worry about insofar as debugger traps go
:>     is to be sure to clear the debug point prior to entering the SchedMutex
:>     within the debugger trap.  I think this is trivial.
:
:This is the problem that I mentioned above.  It is not completely trivial
:to fix, because the debugger is entered for all types of fatal traps.
:Fatal traps aren't necessarily fatal for people who can use ddb.
:
:Bruce

    I meant 'clear the break point'.  Assuming a set break point.

	(break point trap)	(make sure no printf's in the code path)
	debugger entry		(make sure no printf's in the code path)
	clear break point
				... now printf's are safe.
				... now scheduler mutex calls are safe

	I have not audited the debugger breakpoint code.  It's probably
	'wrong', but it wouldn't take much to fix it if it is.

    There are a few recursive situations that must be carefully protected
    for debugger access.  The spl*() code in isa/ipl_funcs.c for example
    has code to issue certain panics one-time in order to prevent panic
    recursion when the system drops into the debugger.

    I will point out that there are several cases *already* (without
    my patch) where an attempt to enter the debugger leads to a recursive
    panic.  We aren't going to be making things any worse then they already
    are.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message