From owner-freebsd-current Tue May 7 16:45:47 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id QAA25188 for current-outgoing; Tue, 7 May 1996 16:45:47 -0700 (PDT) Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id QAA25178 for ; Tue, 7 May 1996 16:45:41 -0700 (PDT) Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.12/8.6.9) id JAA11831; Wed, 8 May 1996 09:41:48 +1000 Date: Wed, 8 May 1996 09:41:48 +1000 From: Bruce Evans Message-Id: <199605072341.JAA11831@godzilla.zeta.org.au> To: asami@cs.berkeley.edu, bde@zeta.org.au Subject: Re: more on fast bcopy Cc: culler@cs.berkeley.edu, current@freebsd.org, ken@area238.residence.gatech.edu, marc@bowtie.nl, nisha@cs.berkeley.edu, pattrsn@cs.berkeley.edu, wollman@lcs.mit.edu, wscott@ichips.intel.com Sender: owner-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > * Why not? :-) It should be possible to use the fpu after saving and > * restoring the FP registers reentrantly. > ^^^^^^^^^^^ >Yeah, we were running into problems with this. Can you tell us how to >do it? ;) Something like: subl $108,%esp movl %cr0,%edx pushl %edx # if used clts fnsave (%esp) ... frstor (%esp) popl %edx # if used movl %edx,%cr0 addl $108,%esp The stack may need to be larger. The complications involving IRQ13 don't apply since this method is too slow to use on systems with external coprocessors. The commented out code in fpunrolled.s doesn't preserve CR0_TS. >I see. By the way, we tried unrolling the loops even more, and >actually got up to 80MB/s for FP and 60MB/s for integer registers >(this is for bcopy). I don't think more unrolling is good. It will bust the I-cache and it should be possible to schedule the loop control instructions to take essentially zero time compared with the D-cache-missing memory access instructions. Bruce