Date: Tue, 20 Feb 2007 18:21:52 -0600 From: Billy Newsom <billy@nlcc.us> To: stable@freebsd.org Subject: The code for rebooting an SMP machine doesn't always work (still) Message-ID: <45DB90A0.4070906@nlcc.us>
next in thread | raw e-mail | index | archive | help
When a SMP machine does not have an AT keyboard controller, there needs to be a way to reboot the machine under FreeBSD! I have another system which fails to reboot under FreeBSD. This time it is a bleeding-edge current system and FreeBSD 6.2-release. From what I can tell, the code to reboot machines has not really changed much in over ten years. There is definitely something wrong with it, however, probably in SMP systems. Here is what happens. When I do a shutdown and reboot of this machine (which lacks the keyboard controller), I get a notice that the Keyboard reset failed. //Keyboard reset did not work, attempting CPU shutdown// Then, there is an attempt to reboot the machine which results in a Fatal Trap 12. Google for triple fault reboot perhaps kern/94822 Now, I have looked at the code a lot to see what is happening. Taking vm_mach_dep.c to task, it is obvious that the keyboard reboot is the norm, with the alternate method used as a last resort. (This is true even for AMD64, which is what I am using!) I even looked at the locore.s code to see how that reboot code works (written in assemler, I think) there, and they do not even try the keyboard reset. The idea, it would seem, is to cause what is known as a triple fault in the CPU, which is supposed to force it to reset. (I cross-referenced to other operating systems, like NetBSD) In this case, I think maybe the CPU is somehow surviving the attempt to be rebooted when certain things happen. I wonder if someone would like to test this, simply remove the portion of vm_mach_dep.c that attempts the keyboard reset and see if the remaining C code there works. After all, this bug only shows up on the odd machine which has no KBC. The code is easy to spot because the comment is *//* "good night, sweet prince .... <THUNK!>" */ and it has been in the code since the 1990s at least. /* Examples of affected machines? Well, I am testing a Mac Pro with dual Xeons and four cores. I believe that blade servers are often without a keyboard controller, too. Many embedded systems have no KBC. The other example is a machine that I still run FreeBSD 5 on. It is a dual Pentium Pro 200. Notice that both of my examples are running SMP, and this could have a lot to do with being able to force a CPU to execute and perform three See cpu_reset_real() and its comments at http://fxr.watson.org/fxr/source//amd64/amd64/vm_machdep.c http://fxr.watson.org/fxr/source/i386/i386/vm_machdep.c For those who might think I didn't try everything: I tried this in device.hints: # Billy removed these six things for Mac Pro hint.atkbdc.0.disabled="1" hint.atkbd.0.disabled="1" hint.sio.0.disabled="1" hint.sio.1.disabled="1" hint.ppc.0.disabled="1" hint.psm.0.disabled="1" # Billy removed these six things for Mac Pro I tried removing those culprits from the kernel, too. Less errors at boot, but never would it reboot. It will do this: halt -p (Works) It will reboot under Windows XP (same machine) It will reboot at the Boot Loader prompt (type reboot, and it does that. See locore.s) In other words, amd64's vm_machdep.c is the problem, but I must say that I'm pretty confident that the same is true for i386. My dual Pentium Pro stopped rebooting okay when upgraded from FreeBSD 4.x to 5.2 and still won't reboot. As a footnote, there is a kernel option called BROKEN_KEYBOARD_RESET. Great, right? Well, someone disabled it for amd64, so the kernel wouldn't even build with that option. Shame on us for removing a simple way to troubleshoot a problem. I would recommend adding that back as either a device hint or a kernel option. It's still available for i386. But all it would do for me is avoid the attempt to try the keyboard reset, which doesn't freeze or panic this computer, it simply just doesn't work. Thanks for any help. I have collected a lot of data if someone is interested. I may post my dmesg output for this Macintosh anyway just for someone's reference.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45DB90A0.4070906>