Date: Thu, 25 Jan 1996 04:00:19 +1100 From: Bruce Evans <bde@zeta.org.au> To: asami@cs.berkeley.edu, koshy@india.hp.com Cc: hackers@FreeBSD.ORG Subject: Re: Pentium bcopy Message-ID: <199601241700.EAA28142@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>sa> Does anyone know if there are any `gotchas' concerning the use of fp >sa> regs in the kernel? Yes, they must not be used. The FP registers belong to the last process that used them. This assumption is used to do fast context switching. >Are the FPU registers saved and restored as part of interrupt handling? No. This would be very inefficient. >You would need to ensure this if you are using your FP-enabled bcopy from >any interrupt routine. You would also have to change the FP mode to avoid traps, and handle the i377-motherboard bugs that sometimes cause traps anyway. >Blindly saving all FP registers when 'bcopy' is invoked has its own >cost, so you probably need to use the FP registers method only if the costs (protected mode) according to "Pentium Processor Optimization Tools": cpu 387 486 Pentium fsave 375-376 143 124 NP frestor 308 120 70 NP fldl 25 3 1 FX fstl 45 8 2 NP movl (load) 4 1 1 UV movl (store) 2 1 1 UV >amount of data to be copied is large. You may need to experiment >and determine the best size to switch from regular bcopy to the FP version. There is no such size according to the above, since on the Pentium, FP loads are the same speed as efficiently (dual) pipelined integer loads, while FP stores are twice as slow. >Also you need to be sure that the FP registers are accessible. A machine >with a 486SX or a plain 386 cannot use this technique. Indeed on a 387 >the technique could even be slower than a rep movsl. On a 387, FP loads are 25/8 times as slow; stores are 45/4 times as slow. On a 486, FP loads are 3/2 times as slow; stores are 8/2 times as slow. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199601241700.EAA28142>