Date: Thu, 9 Nov 95 14:03 IST From: koshy@blr.novell.com To: hasty@rah.star-gate.com () Cc: freebsd-hackers@freebsd.org Subject: Load/Store using FPU regs ... Message-ID: <30a1bccf0.4265@novidc.blr.novell.com> In-Reply-To: <199511071056.CAA02766@rah.star-gate.com> (hasty@rah.star-gate.com)
next in thread | previous in thread | raw e-mail | index | archive | help
>>>>> "Amancio" == "Amancio Hasty Jr " <hasty@rah.star-gate.com> writes: >>> L20: fldl (%ebx) fstpl (%ecx) ... >>> >>> The resulting program copies data at about 60 Megabytes per >>> second. Using the FPU registers for memmove/bitblt operations was a technique I first saw on an i860. We used to do a series of reads into FPU regs followed by a series of writes. This benefited us because the memory subsystem had an 11 clock latency for the first read, but could deliver successive quadwords every 3 or so clocks. Latency for the first write was less than that for a read but was still significant. Thus 16 reads followed by 16 writes ran faster than 16 reads alternated with 16 writes. Now, I'm not sure if this approach can be used across all processors. Some FPU's could raise exceptions if illegal bit-patterns are loaded into its registers. The x86 FPU in particular has very few registers and a LIFO access pattern for loads and stores so I don't know if the same trick would work well for it. Koshy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?30a1bccf0.4265>