Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 May 1996 22:07:13 +1000
From:      Bruce Evans <bde@zeta.org.au>
To:        asami@cs.berkeley.edu, current@FreeBSD.org
Cc:        nisha@cs.berkeley.edu
Subject:   Re: more on fast bcopy
Message-ID:  <199605061207.WAA04793@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help
>The FP thing is still by far the fastest for large copies on all the
>Pentiums we've tried over here, but we can't use that in the kernel.

Why not? :-)  It should be possible to use the fpu after saving and
restoring the FP registers reentrantly.

>We've got 67MB/s on the 133MHz Pentium + Triton here.  Wow.

Same here.  An FP method seems to be the fastest way of bzeroing
uncached memory too.  I get about 150MB/sec for an FP based bzero and
about 85MB/sec (max) for all reasonable integer register based versions.

The key difference seems to be that FP stores to uncached memory are the
same speed as integer register stores but they can be twice as wide.
OTOH, FP loads from uncached memory are almost the same speed as _pairs_
of integer register loads - both cases have to wait for a cache line
fetch, and pairing works right after the line is fetched.

I couldn't get pairs of uncached writes to work at all.  Even `pushal'
apparently does 8 separate writes.  This is on an ASUS P55TP4XE.  Is
this an ASUS or Triton limitation?

>Please send the output of "sh runtests", I would lie to hear
>especially from people with 486/P6 or Pentium with slow memory
>systems.

It gave 0.2MB/sec for swapping on a slow IDE disk on a 486/33 with 8MB,
and the Makefile failed because "." isn't in my $PATH :-).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199605061207.WAA04793>