From owner-freebsd-smp Fri Jun 23 10:43:37 2000 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 6F9F937C3CE for ; Fri, 23 Jun 2000 10:43:07 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id KAA11163; Fri, 23 Jun 2000 10:43:01 -0700 (PDT) (envelope-from dillon) Date: Fri, 23 Jun 2000 10:43:01 -0700 (PDT) From: Matthew Dillon Message-Id: <200006231743.KAA11163@apollo.backplane.com> To: Bruce Evans Cc: Jason Evans , Greg Lehey , Warner Losh , The Hermit Hacker , freebsd-smp@FreeBSD.ORG Subject: Re: SP Patchset #1 up References: Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org :> SchedMutex is already held then getting it again isn't going to hurt. :> If it isn't held then getting it the first time isn't going to hurt. : :In that case, it doesn't protect against unsupported recursion in the syscons :console i/o routines any better than spltty(). It protects against: * an interrupt occuring during the critical section on the same cpu * entry into those routines from another cpu Which is as good as you can get, really, since the only other option is to lockup. If you get a trap in the middle of the debugging section, with SchedMutex held, I'd wager you would rather see another debug prompt then see a complete lockup. -Matt :> SchedMutex is never invalid. No mutex is ever 'invalid'... they are :> locked in an atomic cmpexg. Either SchedMutex is held when the break :> point occurs, or it isn't. : :>From your patches for exception.s: : :| + /* :| + * We have to protect the SchedMutex and curproc fixup from :| + * interrupts on this cpu (which check for SchedMutex being :| + * held on the 'current' cpu). :| + */ :| + cli :| + You mean i386/i386/swtch.s The issue here is that if an interrupt occurs on the cpu holding SchedMutex, the interrupt must be defered. When the switch code switches to a new process it must change curproc, which breaks the detection in the interrupt code that determines that the current cpu is holding SchedMutex. Thus interrupts (on the current cpu only) must be disabled to allow the switch code to update curproc AND SchedMutex. The SchedMutex is held throughout this period, but there is a small window where it is held by the 'wrong' process (curproc has been changed, but SchedMutex's mtx_lock has not), which I am currently using cli to protect. If a debugger trap were to occur on the current cpu at just point, it would indeed lead to a lockup. You are absolutely correct. So maybe the comment should read: don't create any break points in between the setting of curproc and the fixup of SchedMutex! We can solve this without the CLI/STI by releasing the SchedMutex entirely, changing curproc, and then regaining SchedMutex. I personally do not like that idea but it would be more 'correct', and it would not have the lockup problem if a DDB trap or NMI occured on the current cpu at just that point. :"cli" and other forms of disabling interrupts provide no protection against :exceptions which aren't interrupts, in particular against debugger traps. Very true. On the otherhand, I don't think it's possible for anything but a set break point to trap the code at that point so I don't consider it a big problem. Oh, ok.. NMI could do it. :> I don't see the issue here. interrupts cannot nest SchedMutex - look :> at the interrupt code. At the moment interrupt can nest other interrupts :> only insofar as the new interrupt occuring before the old interrupt has :> obtained the giant mutex. : :Traps are not interrupts. Debugger traps are the main problem. NMIs :would be a bigger problem if they occurred in normal operation (note :that our NMI handling is broken; it does dangerous things like calling :printf). We could support NMIs by releasing the SchedMutex in that critical swtch.s section and then regaining it after adjusting curproc. The NMI would be able to support the kernel printf() in that case. It is possible to release SchedMutex simply with a non-locked andl $MTF_CONTESTED,_SchedMutex+MTX_LOCK (because MTF_CONTESTED cannot be set by the 'other' cpu for SchedMutex), but regaining it would require a call to mtx_enter_sched_quick. On the otherhand, we could get rid of the cli/sti pair and that might make up enough time to be close to break-even on the performance. :> :If switching from a high priority interrupt task to a low priority one is :> :allowed, then the first few levels don't need to be supported. :> :> I don't follow. The SchedMutex recursion count is saved and restored :> when a switch occurs. : :This will only matter when (if) prioritized interrupts are supported again. Prioritized interrupts will follow naturally when Greg puts in the heavy-weight interrupt threads. The interrupt threads will be prioritized just like normal processes, and the highest priority thread is the one that is going to get to run first (even interrupting and being switched into from a lower priority interrupt thread). :> :> If you place a debugger trap at a point where SchedMutex is held, :> guess which cpu the debugger takes the trap on? The one holding :> SchedMutex. Thus no problem... the debug trap will simply bump the :> recursion count of SchedMutex temporarily. : :Are you saying that this code is never reached for debugger traps? : :I'm not even thinking about complications for SMP. The problems I'm :talking about affect for UP. The SMP case shouldn't be significantly :different for debugger traps, since the first thing the debugger trap :handler should do is stop the other CPUs. I think if we fix the one case you pointed out in swtch.s, that we could take a debugger trap anywhere except in the debugger trap dispatch code itself. SP or MP. :> The only thing we have to worry about insofar as debugger traps go :> is to be sure to clear the debug point prior to entering the SchedMutex :> within the debugger trap. I think this is trivial. : :This is the problem that I mentioned above. It is not completely trivial :to fix, because the debugger is entered for all types of fatal traps. :Fatal traps aren't necessarily fatal for people who can use ddb. : :Bruce I meant 'clear the break point'. Assuming a set break point. (break point trap) (make sure no printf's in the code path) debugger entry (make sure no printf's in the code path) clear break point ... now printf's are safe. ... now scheduler mutex calls are safe I have not audited the debugger breakpoint code. It's probably 'wrong', but it wouldn't take much to fix it if it is. There are a few recursive situations that must be carefully protected for debugger access. The spl*() code in isa/ipl_funcs.c for example has code to issue certain panics one-time in order to prevent panic recursion when the system drops into the debugger. I will point out that there are several cases *already* (without my patch) where an attempt to enter the debugger leads to a recursive panic. We aren't going to be making things any worse then they already are. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message