Date: Sat, 06 Apr 1996 19:54:25 +0200 From: Torbjorn Granlund <tege@matematik.su.se> To: Bruce Evans <bde@zeta.org.au> Cc: asami@cs.berkeley.edu, current@freebsd.org, hasty@rah.star-gate.com, mrami@minerva.cis.yale.edu, nisha@cs.berkeley.edu, tege@matematik.su.se Subject: Re: optimized bzeros found harmful (was: fast memory copy ...) Message-ID: <199604061754.TAA17355@insanus.matematik.su.se> In-Reply-To: Your message of "Sat, 06 Apr 1996 09:13:46 %2B1000." <199604052313.JAA28956@godzilla.zeta.org.au>
next in thread | previous in thread | raw e-mail | index | archive | help
This behaviour is consistent with the data being zeroed usually not being in the L2 cache. RBW is 33% slower in that case on my system. Other cases: if the data is in the L2 cache but not in the L1 cache, then RBW is between 0% and 33% faster; if data the data is in the L1 cache, then RBW is 8.5 times faster (740MB/s!). This must be a misunderstanding! If the data is really in the L1 cache, the read-before-write is wasted and just contributes to the overhead. The read-before-write is effective if and only if the data is not in the L1 cache. In that case, it forces allocation of the cache line in the L1 cache, and thereby allows a 14x peak speedup. If other behaviours are observed, the timing framework confuses you. All other CPUs I know of have caches that do allocate-on-write.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199604061754.TAA17355>