From owner-freebsd-smp Fri Jun 23 10:15: 2 2000 Delivered-To: freebsd-smp@freebsd.org Received: from gidora.zeta.org.au (gidora.zeta.org.au [203.26.10.25]) by hub.freebsd.org (Postfix) with SMTP id 24F9137C3E3 for ; Fri, 23 Jun 2000 10:14:51 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: (qmail 6652 invoked from network); 23 Jun 2000 17:14:43 -0000 Received: from unknown (HELO bde.zeta.org.au) (203.2.228.102) by gidora.zeta.org.au with SMTP; 23 Jun 2000 17:14:43 -0000 Date: Sat, 24 Jun 2000 03:14:40 +1000 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Matthew Dillon Cc: Jason Evans , Greg Lehey , Warner Losh , The Hermit Hacker , freebsd-smp@FreeBSD.ORG Subject: Re: SP Patchset #1 up In-Reply-To: <200006231556.IAA10465@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, 23 Jun 2000, Matthew Dillon wrote: > :> I don't follow. Why do I have a deadlock? This is SchedMutex I'm > :> talking about here, not GiantMutex. > : > :mtx_enter_sched_quick() is mtx_enter(&SchedMutex, MTX_SPIN). > : > :| /* > :| * mtx_enter() > :| * ... > :| * WARNING: DO NOT PUT KERNEL PRINTFS IN THE SPIN CODE!, you may > :| * create a recursion. > :| */ > : > :This warning gives another reason why ddb can't call locking functions. > :Putting a breakpoint at a locking functions or wandering into a locking > :function while single stepping will cause console i/o. > > A break point is a one-time deal. That'l work just fine. If the Except for a breakpoint in the debugger's breakpoint handling code or in the debugger's i/o functions. Setting a breakpoint in the debugger's breakpoint handling code is user error and very difficult to render harmless. Setting a breakpoint in debugger's i/o functions is not necessarily a user error, since parts of the functions are used by the kernel generally. E.g., setting a breakpoint at cnputc() currently causes a recursive panic in some cases, because trap_fatal() calls printf() before ddb can remove the breakpoint and kdb_trap() calls db_printf() before ddb removes the breakpoint. > SchedMutex is already held then getting it again isn't going to hurt. > If it isn't held then getting it the first time isn't going to hurt. In that case, it doesn't protect against unsupported recursion in the syscons console i/o routines any better than spltty(). > :| static __inline void > :| mtx_enter(struct mtx *mtp, int type) > :| { > :| u_int32_t newv; > :| > :| newv = (u_int32_t)curproc | MTF_HELD; > :| if ((mtp->mtx_lock & ~MTF_CONTESTED) == newv) { > : > :Here is another reason why ddb can't call locking functions. curproc > :may be in the middle of being switched when a debugger trap occurs, so > :ddb can't call anything that depends on it being valid. SchedMutex may > :be invalid for the same reason. > > SchedMutex is never invalid. No mutex is ever 'invalid'... they are > locked in an atomic cmpexg. Either SchedMutex is held when the break > point occurs, or it isn't. From your patches for exception.s: | + /* | + * We have to protect the SchedMutex and curproc fixup from | + * interrupts on this cpu (which check for SchedMutex being | + * held on the 'current' cpu). | + */ | + cli | + | #ifdef SMP | #ifdef GRAB_LOPRIO /* hold LOPRIO for INTs */ | #ifdef CHEAP_TPR | @@ -455,12 +297,22 @@ | movl %edx, _curpcb | movl %ecx, _curproc /* into next process */ | | -#ifdef SMP | - movl _cpu_lockid, %eax | - orl PCB_MPNEST(%edx), %eax /* add next count from PROC */ | - movl %eax, _mp_lock /* load the mp_lock */ | + /* | + * Restore SchedMutex, be careful not to loose MTF_CONTESTED. | + * Note that MTF_HELD *MUST* be set here (XXX add assertion/panic if | + * not). | + * | + * The curproc portion of SchedMutex will temporarily cycle through | + * NULL, but since interrupts are disabled and curproc can never | + * be NULL, this will not result in improper operation. | + */ | + movl PCB_SCHEDNEST(%edx), %eax /* add next count from PROC */ | + MPLOCKED | + andl $(MTF_HELD|MTF_CONTESTED),_SchedMutex+MTX_LOCK | + MPLOCKED | + orl %ecx, _SchedMutex+MTX_LOCK | + movl %eax, _SchedMutex+MTX_RECURSE /* load the sched mutex */ | /* XXX FIXME: we should be restoring the local APIC TPR */ | -#endif /* SMP */ "cli" and other forms of disabling interrupts provide no protection against exceptions which aren't interrupts, in particular against debugger traps. > :Many levels of recursion are possible: > : > : printf in process context > : --> interrupt ... printf in interrupt context > : --> higher priority interrupt ... printf in interrupt context > : ... even higher priority interrupts > : --> debugger trap ... debugger printf > : --> fatal trap for bug in printf ... bogus printfs (should longjmp) > : --> double fault for bug in fatal trap handler ... final printfs > > I don't see the issue here. interrupts cannot nest SchedMutex - look > at the interrupt code. At the moment interrupt can nest other interrupts > only insofar as the new interrupt occuring before the old interrupt has > obtained the giant mutex. Traps are not interrupts. Debugger traps are the main problem. NMIs would be a bigger problem if they occurred in normal operation (note that our NMI handling is broken; it does dangerous things like calling printf). > :If switching from a high priority interrupt task to a low priority one is > :allowed, then the first few levels don't need to be supported. > > I don't follow. The SchedMutex recursion count is saved and restored > when a switch occurs. This will only matter when (if) prioritized interrupts are supported again. > :| } else { > :| if (atomic_cmpexg_int(&mtp->mtx_lock, 0, newv) != 0) { > :| if (type & MTX_SPIN) { > :| _mtx_enter_spin(mtp, newv); > : > :If this code is reached, then it should spin for ever, since other tasks > :should not run, even if there is another CPU for it to run on. > > If you place a debugger trap at a point where SchedMutex is held, > guess which cpu the debugger takes the trap on? The one holding > SchedMutex. Thus no problem... the debug trap will simply bump the > recursion count of SchedMutex temporarily. Are you saying that this code is never reached for debugger traps? I'm not even thinking about complications for SMP. The problems I'm talking about affect for UP. The SMP case shouldn't be significantly different for debugger traps, since the first thing the debugger trap handler should do is stop the other CPUs. > The only thing we have to worry about insofar as debugger traps go > is to be sure to clear the debug point prior to entering the SchedMutex > within the debugger trap. I think this is trivial. This is the problem that I mentioned above. It is not completely trivial to fix, because the debugger is entered for all types of fatal traps. Fatal traps aren't necessarily fatal for people who can use ddb. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message