Date: Thu, 9 Nov 95 14:03 IST From: koshy@blr.novell.com To: hasty@rah.star-gate.com () Cc: freebsd-hackers@freebsd.org Subject: Load/Store using FPU regs ... Message-ID: <30a1bccf0.4265@novidc.blr.novell.com> In-Reply-To: <199511071056.CAA02766@rah.star-gate.com> (hasty@rah.star-gate.com)
next in thread | previous in thread | raw e-mail | index | archive | help
>>>>> "Amancio" == "Amancio Hasty Jr " <hasty@rah.star-gate.com> writes:
>>> L20: fldl (%ebx) fstpl (%ecx) ...
>>>
>>> The resulting program copies data at about 60 Megabytes per
>>> second.
Using the FPU registers for memmove/bitblt operations was a technique
I first saw on an i860. We used to do a series of reads into FPU regs
followed by a series of writes. This benefited us because the memory
subsystem had an 11 clock latency for the first read, but could
deliver successive quadwords every 3 or so clocks. Latency for the
first write was less than that for a read but was still significant.
Thus 16 reads followed by 16 writes ran faster than 16 reads
alternated with 16 writes.
Now, I'm not sure if this approach can be used across all processors.
Some FPU's could raise exceptions if illegal bit-patterns are loaded
into its registers. The x86 FPU in particular has very few registers
and a LIFO access pattern for loads and stores so I don't know if the
same trick would work well for it.
Koshy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?30a1bccf0.4265>
