From owner-freebsd-current Tue May 7 17:07:15 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id RAA27504 for current-outgoing; Tue, 7 May 1996 17:07:15 -0700 (PDT) Received: from sunrise.cs.berkeley.edu (root@sunrise.CS.Berkeley.EDU [128.32.38.121]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id RAA27497 for ; Tue, 7 May 1996 17:07:13 -0700 (PDT) Received: (from asami@localhost) by sunrise.cs.berkeley.edu (8.6.12/8.6.12) id RAA10390; Tue, 7 May 1996 17:07:22 -0700 Date: Tue, 7 May 1996 17:07:22 -0700 Message-Id: <199605080007.RAA10390@sunrise.cs.berkeley.edu> To: bde@zeta.org.au CC: bde@zeta.org.au, culler@cs.berkeley.edu, current@freebsd.org, ken@area238.residence.gatech.edu, marc@bowtie.nl, nisha@cs.berkeley.edu, pattrsn@cs.berkeley.edu, wollman@lcs.mit.edu, wscott@ichips.intel.com In-reply-to: <199605072341.JAA11831@godzilla.zeta.org.au> (message from Bruce Evans on Wed, 8 May 1996 09:41:48 +1000) Subject: Re: more on fast bcopy From: asami@cs.berkeley.edu (Satoshi Asami) Sender: owner-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk * >Yeah, we were running into problems with this. Can you tell us how to * >do it? ;) * * Something like: * * subl $108,%esp * movl %cr0,%edx * pushl %edx # if used * clts ^^^^ Oops, didn't know about that one. ;) * fnsave (%esp) * ... * * frstor (%esp) * popl %edx # if used * movl %edx,%cr0 * addl $108,%esp * * The stack may need to be larger. * * The complications involving IRQ13 don't apply since this method is too slow * to use on systems with external coprocessors. * * The commented out code in fpunrolled.s doesn't preserve CR0_TS. Really? This is what I had: movl %cr0,%edx movl $8, %eax /* CR0_TS */ not %eax andl %eax,%edx /* clear CR0_TS */ movl %edx,%cr0 : andl $8,%edx movl %cr0,%eax orl %edx, %eax /* reset CR0_TS to the original value */ movl %eax,%cr0 The original value of %cr0 is saved in %edx, and the CR0_TS bit is extracted and then or'ed back into %cr0 at the end. I did it this way because I didn't know if any of the other bits in %cr0 would change inside the loop. By the way, the problems we were seeing were random file corruptions, and I thought it was because FP regs aren't saved as part of the context switch (and although we are saving/restoring them upon entry and leaving our function, something else would come along and mess it up). Will it explain this? * I don't think more unrolling is good. It will bust the I-cache and it * should be possible to schedule the loop control instructions to take * essentially zero time compared with the D-cache-missing memory access * instructions. Hmm.... Satoshi