Date: Thu, 31 Mar 2005 21:40:40 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Peter Jeremy <PeterJeremy@optushome.com.au> Cc: bde@freebsd.org Subject: Re: Fwd: 5-STABLE kernel build with icc broken Message-ID: <20050331210931.S2670@epsplex.bde.org> In-Reply-To: <20050331104635.GH71384@cirb503493.alcatel.com.au> References: <20050327133059.3d68a78c@Magellan.Leidinger.net> <20050327162839.2fafa6aa@Magellan.Leidinger.net> <5bbfe7d405032823232103d537@mail.gmail.com> <424A23A8.5040109@ec.rr.com><20050330130051.GA4416@VARK.MIT.EDU> <20050331104635.GH71384@cirb503493.alcatel.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 31 Mar 2005, Peter Jeremy wrote: > On Thu, 2005-Mar-31 17:17:58 +1000, Bruce Evans wrote: >> I still >> think fully lazy switching (c2) is the best general method. > > I think it depends on the FP workload. It's a definite win if there > is exactly one FP thread - in this case the FPU state never needs to > be saved (and you could even optimise away the DNA trap by clearing > the TS and EM bits if the switched-to curthread is fputhread). I think stopping the trap would be the usual method (not sure what Linux did), but to collect statistics for determining affinity you would want to take the trap anyway. > The worst case is two (or more) FP-intensive threads - in this case, > lazy switching is of no benefit. The DNA trap overheads mean that > the performance is worse than just saving/restoring the FP state > during a context switch. > > My guess is that the current generation workstation is closer to the > second case - current generation graphical bloatware uses a lot of > FP for rendering, not to mention that the idle task has a reasonable > chance of being an FP-intensive distributed computing task (setiathome > or similar). It's probably time to do some more measuring (I'm not > offering just now, I have lots of other things on my TODO list). Bloatware might be so hoggish that it rarely makes context switches :-). Context switches for interrupts increase the problem though, as would using FP more in the kernel. >> BTW, David and I recently found a bug in the context switching in the >> fxsr case, at least on Athlon-XP's and AMD64's. > > I gather this is not noticable unless the application is doing its > own FPU save/restore. Is there a solution or work-around? It's most noticeable for debugging, and if you worry about leaking thread context. Fortunately, the last-instruction pointers won't have real user data in them unless the application encodes it there intentionally. I can't see any efficent solution or workaround. The kernel should do a full save/restore for processes being debugged. For applications, the bug seems to be larger. Even if they know about the amd behaviour and do a full save/restore because they need it, it won't work because the kernel doesn't preserve the state across context switches. Applications like vmware might care more than most. I forgot to mention that we couldn't find anything in intel manuals about this behaviour, so it might be completely amd-specific. Also, the instruction pointers are fundamentally broken for 64-bit CPUs, since although they are 64 bits, they have the segment selector encoded in their top 32 bits, so they are not really different from the 32:32 selector:pointer format for the non-fxsr case. Their format is specified by SSE2 so 64-bit extensions would have to be elsewhere, but amd64 doesn't seem to extend them. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050331210931.S2670>