Date: Wed, 8 May 1996 09:41:48 +1000 From: Bruce Evans <bde@zeta.org.au> To: asami@cs.berkeley.edu, bde@zeta.org.au Cc: culler@cs.berkeley.edu, current@freebsd.org, ken@area238.residence.gatech.edu, marc@bowtie.nl, nisha@cs.berkeley.edu, pattrsn@cs.berkeley.edu, wollman@lcs.mit.edu, wscott@ichips.intel.com Subject: Re: more on fast bcopy Message-ID: <199605072341.JAA11831@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
> * Why not? :-) It should be possible to use the fpu after saving and > * restoring the FP registers reentrantly. > ^^^^^^^^^^^ >Yeah, we were running into problems with this. Can you tell us how to >do it? ;) Something like: subl $108,%esp movl %cr0,%edx pushl %edx # if used clts fnsave (%esp) ... frstor (%esp) popl %edx # if used movl %edx,%cr0 addl $108,%esp The stack may need to be larger. The complications involving IRQ13 don't apply since this method is too slow to use on systems with external coprocessors. The commented out code in fpunrolled.s doesn't preserve CR0_TS. >I see. By the way, we tried unrolling the loops even more, and >actually got up to 80MB/s for FP and 60MB/s for integer registers >(this is for bcopy). I don't think more unrolling is good. It will bust the I-cache and it should be possible to schedule the loop control instructions to take essentially zero time compared with the D-cache-missing memory access instructions. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199605072341.JAA11831>