Date: Tue, 16 Jul 2013 18:54:57 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-acpi@freebsd.org Subject: Re: Revisiting FPU context resume on i386 Message-ID: <20130716164612.P1088@besplex.bde.org> In-Reply-To: <20130716052641.GE91021@kib.kiev.ua> References: <20130716070716.15b7282b9dca2cbc8a767631@tackymt.homeip.net> <20130716052641.GE91021@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 16 Jul 2013, Konstantin Belousov wrote: > On Tue, Jul 16, 2013 at 07:07:16AM +0900, Taku YAMAMOTO wrote: >> sys/i386/i386/swtch.s have a big FIX ME in resumectx() >> and I have occationally got bitten by it; resulting in SIGFPE disasters. >> >> After cursory looking around FPU context, I think it's the simplest way >> to set CR0_TS on resumectx() and to let npxdna() DTRT lazilly. >> >> Attached is the mods that I'm currently using without a problem at the moment. >> It at least doesn't interact with normal resume operations badly. >> >> Ah, of cource, we have a choice to throw i386 away and get on amd64, >> but at least I won't because I'd miss VOCALOIDs running on wine/i386 :) amd64 uses npx and CR0_TS too. It just misspells npx as fpu and even misspells CR0_TS as TS0_CR in a comment in cpu_switch.S. The problem seems to be just that acpi is not fully supported on i386. The npx state is simpler on i386 (no AVX support), so this should be easy to fix by copying the suspend/resume logic from amd64. >> diff --git a/sys/i386/i386/swtch.s b/sys/i386/i386/swtch.s >> index 80aa6c4..71efae1 100644 >> --- a/sys/i386/i386/swtch.s >> +++ b/sys/i386/i386/swtch.s >> @@ -36,6 +36,7 @@ >> #include "opt_sched.h" >> >> #include <machine/asmacros.h> >> +#include <machine/specialreg.h> >> >> #include "assym.s" >> >> @@ -487,6 +488,10 @@ ENTRY(resumectx) >> movl PCB_CR3(%ecx),%eax >> movl %eax,%cr3 >> movl PCB_CR0(%ecx),%eax >> +#ifdef DEV_NPX >> + /* Let npxdna() restore the FPU context lazily. */ >> + orl $CR0_TS,%eax >> +#endif >> movl %eax,%cr0 >> jmp 1f >> 1: >> @@ -519,10 +524,6 @@ ENTRY(resumectx) >> movl PCB_DR7(%ecx),%eax >> movl %eax,%dr7 >> >> -#ifdef DEV_NPX >> - /* XXX FIX ME */ >> -#endif >> - >> /* Restore other registers */ >> movl PCB_EDI(%ecx),%edi >> movl PCB_ESI(%ecx),%esi > > If this works, fine. But I think that you also should make sure that > invariants which are checked by npxdna() are met. E.g. see the > checks for fpcurthread at the start of the routine. How can this possibly work? If the invariants are satisfied, then suspend would have saved a CR0_TS setting consistent with the invariants, so that there is nothing to fix. Actually, it seems to be necessary to save the state to the pcb and set CR0_TS on suspend, since if the state is not saved then, then nothing saves it before resume, and suspension can apparently lose it. If the state was saved on suspend, then CR0_TS in the pcb was set on suspend, so the patch has no effect. If the state wasn't saved on suspend, then the patch forces restoration of the last saved state, even when suspension didn't lose it. Thus in the latter case, a wrong (out of date) state is always restored, but usually the state is not very out of date so it sort of works. So the fix should be simply to save the state on suspend, or fix this if it is already supposed to be done. > Might be, just clear the pcpu fpcurthread as part of the resume ? Is it > done by suspend counterpart ? The state must be saved together with this to satisfy invariants. The bug is hard to understand by looking at only savectx(). Strangely, it is the i386 savectx() that saves the state. amd64 has a special routine ctx_fpusave() for saving the state. This is called from cpususpend_handler(). Its handling of CR0_TS seems to be worse than i386's too -- where i386's core routine npxsave() handles CR0_TS and returns with it set (as necessary for putting the state away for suspension), amd64's core routine fpusave() requires the caller to handle CR0_TS, and the ctx_fpusave() caller clears CR0_TS before the call but doesn't set it on return. ctx_fpusave() also doesn't maintained the invariant. So suspend/resume on amd64 already works unusually: more details: - the state saving for suspend/resume is completely independent of the pcb and the invariant. Suspendion just saves the current hardware npx^Wfpu state to a special save area. There would still be problems if any part of the state (hardware/pcb/invariant) changes before supend+resume completes. Apparently this doesn't happen. i386 would have similar problems with the state saved in the pcb if it changed during suspend+resume (now since the save area is not special, changes wouldn't invalidate the saved state, but if the changes result in the state becoming active again then suspension may lose the state). - since ctx_fpusave() is left clear by ctx_fpusave(), it will be clear when resume reloads CR0. ISTR disagreeing with jkim on using a special save area. i386 also has cpususpend_handler(). It has a special save area too, and the only significant difference is that the npx part of the save is done by savectx() instead of separately by ctx_fpusave(). The bug is now fairly obvious: savectx() uses npxsave(), and npxsave() breaks the invariant in various ways: % void % npxsave(addr) % union savefpu *addr; % { % % stop_emulating(); Already the invariant is not satisfied, but this is protected by interrupts being disabled. % fpusave(addr); If fpusave() is fxsave(), this leaves the state in the hardware. Otherwise, fpusave() is fnsave() and it initializes the state in the hardware. In the latter case, the invariant is transiently less satisfied than before. % % start_emulating(); This restores the part of the invariant that says that the state is not in the hardware (an npxdna() is needed to reload the state after fnsave(). % PCPU_SET(fpcurthread, NULL); % } This completes restoring the invariant, but only when 'addr' is in the PCB. When 'addr' is in the special save area, the invariant is very broken. We only call here if the state active to begin with (otherwise, it wouldn't be in the hardware so copying it from the hardware to the pcb would be very wrong). We record the state change in the pcb, but only saved the state in the special save area. fpcurthread in the special save area is either nonexistent or not initialized yet. This is not seriously broken provided we initialized fpcurthread in the special save area soon, and never touch the npx state again until resume completes. We must be careful with the CR0_TS part of the invariant too. When npxsave() was called, it should be clear (and the start_emulating() call should have no effect). When npxsave() returns, CR0_TS is set, and it must be saved somewhere soon. Since suspension is using a special save area, it presumably saves CR0 there too. So the pcb is not really used across suspend/resume. But the above write of fpcurthread to it is dangerous. Now for the inconsistencies in i386 savectx()/resumectx() related to the above: % /* % * savectx(pcb) % * Update pcb, saving current processor state. % */ Wrong comment. This saves in the special save area in most or all cases. It is important to understand that the normal pcb is not really used here and most updates to it are accidental. % ENTRY(savectx) % /* Fetch PCB. */ % movl 4(%esp),%ecx Wrong comment. As above, plus it spells pcb inconstently (capitalization is technically more correct, but we often spell it pcb). amd64 has the same wrong comments. I think the special save area consists mainly of a pcb, but it's very confusing when this pcb is not the usual one. % ... % movl %cr0,%eax % movl %eax,PCB_CR0(%ecx) It saves CR0 quite early. % #ifdef DEV_NPX % #ifdef DEV_NPX % /* % * If fpcurthread == NULL, then the npx h/w state is irrelevant and the % * state had better already be in the pcb. This is true for forks % * but not for dumps (the old book-keeping with FP flags in the pcb % * always lost for dumps because the dump pcb has 0 flags). % * % * If fpcurthread != NULL, then we have to save the npx h/w state to % * fpcurthread's pcb and copy it to the requested pcb, or save to the % * requested pcb and reload. Copying is easier because we would % * have to handle h/w bugs for reloading. We used to lose the % * parent's npx state for forks by forgetting to reload. % */ It saves npx as the very last step. The copying stuff avoids some consistency problems, but the comment about it is very out of date, and the copying is now incomplete. % pushfl % CLI % movl PCPU(FPCURTHREAD),%eax % testl %eax,%eax % je 1f The fpcurthread pointer (now in %eax) is not in the pcb. % % pushl %ecx % movl TD_PCB(%eax),%eax % movl PCB_SAVEFPU(%eax),%eax % pushl %eax % pushl %eax % call npxsave % addl $4,%esp % popl %eax % popl %ecx We first save to the pcb. Everything in the pcb is consistent now. fpcurthread is now 0. This is consistent too. % % pushl $PCB_SAVEFPU_SIZE % leal PCB_USERFPU(%ecx),%ecx % pushl %ecx % pushl %eax % call bcopy % addl $12,%esp The copy of the pcb in the special save area is not quite consistent. We only copied the very-npx-specific parts. CR0 wasn't copied, so it still has the old value for CR0_TS (clear when it should be set). fpcurthread doesn't live in the pcb, but that doesn't seem to be a problem, since the special state where the CR0_TS is set means that fpcurthread must be NULL, so we can recover fpcurthread from CR0_TS in this case. % 1: % popfl % #endif /* DEV_NPX */ % % movl $1,%eax % ret % END(savectx) % % /* % * resumectx(pcb) __fastcall % * Resuming processor state from pcb. % */ % ENTRY(resumectx) % /* Restore GDT. */ % lgdt PCB_GDT(%ecx) Wrong comments, as above. % /* Restore CR2, CR4, CR3 and CR0 */ % ... % movl PCB_CR0(%ecx),%eax % movl %eax,%cr0 % jmp 1f % 1: What's this magic null jump? % ... % #ifdef DEV_NPX % /* XXX FIX ME */ % #endif I now think that your patch works fine. It just sets CR0 to the state that should have been saved. The correct fix is even simpler (assuming that there are no races) -- either move the saving of CR0 after the saving of NPX, or save CR0 again after changing it in npxsave(). The copying from the pcb to the save area can probably be safely removed. npxsave() supports storing directly to the save area. The pcb is not really used for suspend/resume or other callers of savectx() (if any). It doesn't help much for the NPX part of it to be consistent with the save area when other parts are inconsistent and parts like fpucurthread don't live in either the pcb or the save area. savectx() seems to only be used by for dumping, stopping CPUs and suspending CPUs. Its save area is always special, and always an ordinary pcb struct (but amd64 seems to have a separate save area for suspendeded FPU state. I can't find where the pointer for this is initialized). Using the primary pcb is especially unimportant for dumping. When I wrote the above long comment about copying the npx state in FreeBSD-1, savectx() was only used for forking and dumping. It had to be careful for forking. In FreeBSD-2+, it wasn't used for forking or for anything else except dumping. Then the copying in the i386 version of it was excessively careful and the comment was out of date. Now savectx() is important again and must be careful for stopping and suspending CPUs. It is still broken for stopping CPUs on amd64, since it doesn't save the fpu state and neither does cpustop_handler() (unlike cpususpend_handler). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130716164612.P1088>