Date: Thu, 23 Jun 2011 08:51:57 -0400 From: John Baldwin <jhb@freebsd.org> To: Andriy Gapon <avg@freebsd.org> Cc: arch@freebsd.org Subject: Re: stop_cpus*() interface Message-ID: <201106230851.57885.jhb@freebsd.org> In-Reply-To: <4E0217A3.7020802@FreeBSD.org> References: <4E0217A3.7020802@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, June 22, 2011 12:26:11 pm Andriy Gapon wrote: > > I would like to propose to narrow stop_cpus*() interface: > > 1. Remove cpu mask/set parameter. Rationale for this is presented below in a > forwarded message from a private discussion. You may also see that currently > stop_cpus*() functions are always called with either (1) other_cpus mask or (2) > other_cpus & ~stopped_cpus mask, where (2) is really equivalent to (1) because of (1). > > 2. Change return type to void. Currently return value of stop_cpus*() is never > handled and it can not be really handled meaningfully. Simple boolean or errno > return value can not convey which target CPUs were already stopped and which > failed to become stopped and why. I think that it's better to assume that > stop_cpus*() should never fail and add necessary diagnostics to catch cases where > it does fail. > > The below forwarded message provides my thoughts on CPU stopping semantics and > additionally presents my analysis of CPU stopping code in OpenSolaris. > > -------- Original Message -------- > on 12/05/2011 21:17 Andriy Gapon said the following: > > cpu_hard_stop does stop other CPUs in a hard way. At least on some archs it is > > really so, e.g. x86 NMI. This means that stopped CPUs, rather threads that were > > running on them, can be stopped in any kinds of contexts with any kinds of locks > > held, including spinlocks. Given that fact, it is really unsafe to continue > > using any locks after even one CPU is hard-stopped. So any remaining running > > CPUs should be put into a special non-locking mode. This is the reason that we > > invent things like THREAD_PANICED() and use polling mode in kdb context, etc. > > But having more than one CPU, in fact even more than one thread, running in > > non-locking mode is unsafe again - if those CPUs continue execution without any > > synchronization, then they would corrupt shared data. > > Thus, I argue that hard stopping should leave only one CPU and thread running. > > Some more thoughts. > > I think that the above reasoning does even apply to the current soft stopping to > a certain degree. Soft stopping would not leave any spinlocks held, true, but > it can still leave other kinds of locks held, e.g. regular mutexes, sx locks. > And that also produces a very special environment in the end. > So in my opinion current soft stopping should also always stop all other CPUs. > > I think that eventually we will need "really soft" graceful stopping mechanism. > That mechanism would rebind all interrupts away from a CPU being stopped, would > migrate all (non-special) threads away from the CPU, would instruct scheduler to > not run any threads on the CPU, would remove it from any active CPU sets, etc. > Now, this mechanism should really be of a targeted variety, no doubt. > > > I also would like to share some of my observations of OpenSolaris code. > This is not to try to give any support to my proposals - after all we are not > Solaris, but FreeBSD - but simply to share some ideas. > > In OpenSolaris I've noticed three separate CPU stopping mechanisms so far. I am > sure that they have more :-) > > 1. Stopping by debugger. This is very similar to our hard stopping (in their > x86 code[*]). All other CPUs are always stopped. One difference is that the > stopped CPUs run a special command loop while spinning. The master CPU can send > a few commands to the slave CPUs. Examples: the master can tell a slave, if > it's a BSP, to reset a system; the master can tell a slave to become a new > master (I think that this is somewhat equivalent to "thread N" command in gdb). > All commands: > #define KMDB_DPI_CMD_RESUME_ALL 1 /* Resume all CPUs */ > #define KMDB_DPI_CMD_RESUME_MASTER 2 /* Resume only master CPU */ > #define KMDB_DPI_CMD_RESUME_UNLOAD 3 /* Resume for debugger unload */ > #define KMDB_DPI_CMD_SWITCH_CPU 4 /* Switch to another CPU */ > #define KMDB_DPI_CMD_FLUSH_CACHES 5 /* Flush slave caches */ > #define KMDB_DPI_CMD_REBOOT 6 /* Reboot the machine */ > > > 2. Stopping for panic. This is very similar to our hard stopping (in their x86 > code[*]). All other CPUs are always stopped. But this is done via different > code than what debugger does, I am not sure why, maybe some historic legacy. > The difference from our code and the debugger code is the stopped CPUs run a > different stop loop and may do some useful panic work. E.g. my understanding is > that they can be used for compressing a dump image (yes, they compress their dumps > for disk writing speed I guess). > > 3. Something remotely similar to our current soft stopping. Big difference is > that they have special "pause" threads per cpu. This mechanism activates those > threads, the threads make themselves non-preemptable, disable interrupts and > block on some sort of a semaphore until they are told to resume. Not sure what > advantage, if any, this mechanism gives them comparing to our approach. > The mechanism is invoked via pause_cpus() call. It is used mainly to change > state of CPUs (some per-CPU data), like e.g. configuring idle hooks, power > management. > > [!] BTW, they also use this mechanism when onlining/offlining CPUs to avoid > locking in normal paths. That is, for instance, they stop/pause all CPUs, mark > a target CPU as offline, and then restart all CPUs. This way they don't need > any locking when checking (and changing) CPU status. Of course, they also do > all the reasonable things to do - unbinding interrupts, moving away treads, etc. > The mechanism is also used for their checkpoint-resume code (which is used by > suspend/resume) and in their shutdown/reboot path. > This CPU stopping mechanism also always stops all other CPUs. > > > [*] Another difference to note is that they don't use NMI for their equivalents > of our hard stopping. They still have the notion of interrupt levels and > various spl* stuff. So they just have a normal interrupt with highest priority > to penetrate protected contexts. E.g. in their equivalent of spinlock_enter() > they do not outright disable interrupts, but set current level to a special > 'LOCK' level which inhibits all typical (hardware and IPI) interrupts. This > mechanism adds another degree of freedom to their implementation, as such it > complicates code and logic, but also adds some flexibility. > > I hope that there is something useful for you and FreeBSD in this lengthy overview. I really like the OpenSolaris model. You could perhaps merge 1) and 2) it sounds like. The pause thread idea for handling online/offline is quite nice. On x86 you could have IPI_STOP be non-NMI if we adjusted the TPR (%cr8 on amd64) instead of using cli/sti for spinlock_enter/exit. However, older i386 CPUs do not support this, so I think this is only practical on amd64 if we were to go that route. OTOH, I think using an NMI is actually fine (though we need to do a better job of providing a way to register NMI handlers instead of the various hacks we currently have). -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201106230851.57885.jhb>