From owner-freebsd-hackers@FreeBSD.ORG Thu Mar 31 11:40:56 2005 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0924C16A4FA; Thu, 31 Mar 2005 11:40:56 +0000 (GMT) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 55FD043D60; Thu, 31 Mar 2005 11:40:55 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])j2VBemA6010844; Thu, 31 Mar 2005 21:40:48 +1000 Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) j2VBekMq021730; Thu, 31 Mar 2005 21:40:47 +1000 Date: Thu, 31 Mar 2005 21:40:40 +1000 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: Peter Jeremy In-Reply-To: <20050331104635.GH71384@cirb503493.alcatel.com.au> Message-ID: <20050331210931.S2670@epsplex.bde.org> References: <20050327133059.3d68a78c@Magellan.Leidinger.net> <20050327162839.2fafa6aa@Magellan.Leidinger.net> <5bbfe7d405032823232103d537@mail.gmail.com> <424A23A8.5040109@ec.rr.com><20050330130051.GA4416@VARK.MIT.EDU> <20050331104635.GH71384@cirb503493.alcatel.com.au> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Thu, 31 Mar 2005 12:46:40 +0000 cc: David Schultz cc: hackers@freebsd.org cc: jason henson cc: bde@freebsd.org Subject: Re: Fwd: 5-STABLE kernel build with icc broken X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Mar 2005 11:40:56 -0000 On Thu, 31 Mar 2005, Peter Jeremy wrote: > On Thu, 2005-Mar-31 17:17:58 +1000, Bruce Evans wrote: >> I still >> think fully lazy switching (c2) is the best general method. > > I think it depends on the FP workload. It's a definite win if there > is exactly one FP thread - in this case the FPU state never needs to > be saved (and you could even optimise away the DNA trap by clearing > the TS and EM bits if the switched-to curthread is fputhread). I think stopping the trap would be the usual method (not sure what Linux did), but to collect statistics for determining affinity you would want to take the trap anyway. > The worst case is two (or more) FP-intensive threads - in this case, > lazy switching is of no benefit. The DNA trap overheads mean that > the performance is worse than just saving/restoring the FP state > during a context switch. > > My guess is that the current generation workstation is closer to the > second case - current generation graphical bloatware uses a lot of > FP for rendering, not to mention that the idle task has a reasonable > chance of being an FP-intensive distributed computing task (setiathome > or similar). It's probably time to do some more measuring (I'm not > offering just now, I have lots of other things on my TODO list). Bloatware might be so hoggish that it rarely makes context switches :-). Context switches for interrupts increase the problem though, as would using FP more in the kernel. >> BTW, David and I recently found a bug in the context switching in the >> fxsr case, at least on Athlon-XP's and AMD64's. > > I gather this is not noticable unless the application is doing its > own FPU save/restore. Is there a solution or work-around? It's most noticeable for debugging, and if you worry about leaking thread context. Fortunately, the last-instruction pointers won't have real user data in them unless the application encodes it there intentionally. I can't see any efficent solution or workaround. The kernel should do a full save/restore for processes being debugged. For applications, the bug seems to be larger. Even if they know about the amd behaviour and do a full save/restore because they need it, it won't work because the kernel doesn't preserve the state across context switches. Applications like vmware might care more than most. I forgot to mention that we couldn't find anything in intel manuals about this behaviour, so it might be completely amd-specific. Also, the instruction pointers are fundamentally broken for 64-bit CPUs, since although they are 64 bits, they have the segment selector encoded in their top 32 bits, so they are not really different from the 32:32 selector:pointer format for the non-fxsr case. Their format is specified by SSE2 so 64-bit extensions would have to be elsewhere, but amd64 doesn't seem to extend them. Bruce