Date: Sun, 7 Apr 1996 06:01:26 +1000 From: Bruce Evans <bde@zeta.org.au> To: bde@zeta.org.au, tege@matematik.su.se Cc: asami@cs.berkeley.edu, current@FreeBSD.org, hasty@rah.star-gate.com, mrami@minerva.cis.yale.edu, nisha@cs.berkeley.edu Subject: Re: optimized bzeros found harmful (was: fast memory copy ...) Message-ID: <199604062001.GAA08258@godzilla.zeta.org.au>
index | next in thread | raw e-mail
> This behaviour is consistent with the data being zeroed usually not being > in the L2 cache. RBW is 33% slower in that case on my system. Other > cases: if the data is in the L2 cache but not in the L1 cache, then RBW > is between 0% and 33% faster; if data the data is in the L1 cache, then > RBW is 8.5 times faster (740MB/s!). >This must be a misunderstanding! >If the data is really in the L1 cache, the read-before-write is wasted and >just contributes to the overhead. It must not be in the L1 cache. (Why not?) `perfmon' in -currrent shows much more bus activity for write test 3 than for write test 4. E.g., counter 25 (PMC5_WRITE_BACKUP_STALL) is about 117e6 events for test 3 and only 5e6 for test 4. This is for copying a total amount of 100e6 bytes. Let's see your output for `./w -5' and your explanation of it. >The read-before-write is effective if and only if the data is not in the L1 >cache. In that case, it forces allocation of the cache line in the L1 >cache, and thereby allows a 14x peak speedup. >If other behaviours are observed, the timing framework confuses you. Let's see you output for `./w -l 65536 -5'. 64K should fit in the L2 cache (512K). Why does read-before-write give only a 25% speedup? >All other CPUs I know of have caches that do allocate-on-write. Perhaps the Pentium behaviour is best. It seems to penalize writing to the same location without reading it, but this is abnormal behaviour. Brucehome | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199604062001.GAA08258>
