From owner-freebsd-hackers  Wed Aug  6 19:41:54 1997
Return-Path: <owner-freebsd-hackers>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id TAA11232
          for hackers-outgoing; Wed, 6 Aug 1997 19:41:54 -0700 (PDT)
Received: from vinyl.quickweb.com (vinyl.quickweb.com [206.222.77.8])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id TAA11227
          for <hackers@FreeBSD.ORG>; Wed, 6 Aug 1997 19:41:51 -0700 (PDT)
Received: (from mark@localhost) by vinyl.quickweb.com (8.8.5/8.6.12) id WAA15602; Wed, 6 Aug 1997 22:37:48 -0400 (EDT)
Message-ID: <19970806223748.63083@vinyl.quickweb.com>
Date: Wed, 6 Aug 1997 22:37:48 -0400
From: Mark Mayo <mark@quickweb.com>
To: Tony Overfield <tony@dell.com>
Cc: Curt Sampson <cjs@portal.ca>, hackers@FreeBSD.ORG
Subject: Re: Pentium II?
References: <3.0.2.32.19970803041915.006a69e4@bugs.us.dell.com> <Pine.NEB.3.93.970803031523.7035A-100000@gnostic.cynic.net> <3.0.2.32.19970806043249.006df3e4@bugs.us.dell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.81e
In-Reply-To: <3.0.2.32.19970806043249.006df3e4@bugs.us.dell.com>; from Tony Overfield on Wed, Aug 06, 1997 at 04:32:49AM -0500
Sender: owner-freebsd-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Just one question: why are you comparing the PPro with 256K cache
instead of the PPro 200/512 ?? Everybody I know that builds high
end PC server (an oxymoron I know..) used the 512KB version of
the PPro... And most run it at 233MHz...

-Mark


On Wed, Aug 06, 1997 at 04:32:49AM -0500, Tony Overfield wrote:
> At 03:17 AM 8/3/97 -0700, Curt Sampson wrote:
> >I wasn't interested in what you think as much as which particular
> >benchmarks indicate this. Feel free to provide references.
> 
> I claimed that a larger L1 cache makes the processor faster, which 
> at least partially offsets the effect of the slower L2 cache.  
> This is ordinarily a self-evident truth which needs no references.  
> I have no desire to search for references at the behest of skeptics.
> 
> >> It should be easy to agree that larger L1 caches have higher hit rates.
> >
> >Sure. But the L2 cache in the PPro is running at the same speed as
> >the L1 cache in the PPro and the PII. Thus, I don't think that
> >having twice the L1 cache is going to make a lot of difference.
> >Feel free to show me the actual benchmarks that prove me wrong.
> >
> >cjs
> >
> >Curt Sampson    cjs@portal.ca		Info at http://www.portal.ca/
> >Internet Portal Services, Inc.		`And malt does more than Milton can
> >Vancouver, BC   (604) 257-9400		 To justify God's ways to man.' 
> 
> You're wrong.  The L1 cache in the PPro is faster than its L2 cache.
> 
> Since the size of the L1 cache can't be adjusted on PPro processors, 
> it's not easy to find a ready-made benchmark that proves that a larger 
> L1 cache is beneficial.  One way that this can be shown is to compare 
> the Pentium processors to the Pentium w/ MMX processors.  In 
> comparisons between these, the MMX is invariably faster, due (for 
> non-MMX benchmarks) entirely to the larger L1 cache.  However, as 
> you said, this only helps if the L2 cache is slower than the L1 
> cache.  But *that* can be easily proven.
> 
> The performance of a cache depends on more than the clock speed at 
> which it runs.  The L1 cache in the PPro and PII is split between an 
> instruction cache and a dual-ported data cache.  Thus, the L1 cache can 
> transfer up to three sets of data per cycle.  This means the processor 
> can simultaneously read code from the code cache, read data from the 
> data cache, and write data to the data cache.  
> 
> The L2 cache, on the other hand, is a unified instruction and data cache
> with a 64 bit data bus.  This L2 cache is much improved over the Pentium 
> (P5) architecture because it has a dedicated bus.  The dedicated L2 cache 
> bus prevents L2 cache accesses from competing for bandwidth with the 
> external CPU data bus, which may be busy with ordinary CPU traffic,
> traffic from PCI master cycles and traffic from other processors.  
> 
> Even though the built-in L2 cache is very fast, it is not as fast as
> the more tightly integrated L1 cache.
> 
> Some benchmark data is included below.
> 
> First the benchmark pseudocode:
> (If you want the DOS x86 assembly source code, ask me.)
> 
> loop              (for a variety of sizes)
> {
> 	wbinvd     (empty the L1 and L2 caches)
> 	rep movsd  (move, in place, the test memory)
> 	rtsc       (read time stamp counter -> start time)
> 	rep movsd  (move, in place, the test memory)
> 	rtsc       (read time stamp counter -> end time)
> }
> 
> This simple little benchmark shows:
> 
> 1. The PPro L1 data cache is 8KB.
> 2. The PII L1 data cache is 16KB.
> 3. The PII L2 cache is half-speed with respect to the PPro.
> 4. My PPro's L2 cache is 256KB.
> 5. My PII's L2 cache is 512KB.
> 6. DRAM is much slower than the L2 cache (of course).
> 7. The PPro's L2 cache is about two times slower than its L1 cache.
> 8. The PII's L2 cache is about 4 or 5 times slower than its L1 cache.
> 
> The results:
> 
> PPro 200/256
> 
> Moving    2KB  -  Clocks: 0x0000023A  Clocks/KB moved:   285
> Moving    4KB  -  Clocks: 0x000003B9  Clocks/KB moved:   238
> Moving    8KB  -  Clocks: 0x000006C9  Clocks/KB moved:   217
> Moving   12KB  -  Clocks: 0x0000186E  Clocks/KB moved:   521
> Moving   16KB  -  Clocks: 0x0000206E  Clocks/KB moved:   518
> Moving   24KB  -  Clocks: 0x0000306E  Clocks/KB moved:   516
> Moving   32KB  -  Clocks: 0x0000406E  Clocks/KB moved:   515
> Moving   64KB  -  Clocks: 0x0000806E  Clocks/KB moved:   513
> Moving  128KB  -  Clocks: 0x0001006E  Clocks/KB moved:   512
> Moving  256KB  -  Clocks: 0x00020127  Clocks/KB moved:   513
> Moving  384KB  -  Clocks: 0x000DBA60  Clocks/KB moved:  2342
> Moving  512KB  -  Clocks: 0x00124D1D  Clocks/KB moved:  2342
> Moving  768KB  -  Clocks: 0x001B72BE  Clocks/KB moved:  2342
> Moving 1024KB  -  Clocks: 0x00249796  Clocks/KB moved:  2341
> 
> PII 233/512
> 
> Moving    2KB  -  Clocks: 0x0000023A  Clocks/KB moved:   285
> Moving    4KB  -  Clocks: 0x000003BA  Clocks/KB moved:   238
> Moving    8KB  -  Clocks: 0x000006BA  Clocks/KB moved:   215
> Moving   12KB  -  Clocks: 0x000009BA  Clocks/KB moved:   207
> Moving   16KB  -  Clocks: 0x00000CF8  Clocks/KB moved:   207
> Moving   24KB  -  Clocks: 0x0000661B  Clocks/KB moved:  1089
> Moving   32KB  -  Clocks: 0x0000881E  Clocks/KB moved:  1088
> Moving   64KB  -  Clocks: 0x0001101E  Clocks/KB moved:  1088
> Moving  128KB  -  Clocks: 0x00022024  Clocks/KB moved:  1088
> Moving  256KB  -  Clocks: 0x0004401A  Clocks/KB moved:  1088
> Moving  384KB  -  Clocks: 0x00066029  Clocks/KB moved:  1088
> Moving  512KB  -  Clocks: 0x000880C6  Clocks/KB moved:  1088
> Moving  768KB  -  Clocks: 0x0016A600  Clocks/KB moved:  1932
> Moving 1024KB  -  Clocks: 0x0026AC66  Clocks/KB moved:  2475
> 
> Clocks are measured in actual CPU clocks, so these numbers 
> don't change much when the clock speed is changed, except 
> for those which are affected by DRAM accesses, since DRAM 
> speed doesn't scale with CPU speed.
> 
> -
> Tony
> 

-- 
----------------------------------------------------------------------------
 Mark Mayo		  				mark@quickweb.com       
 RingZero Comp.  	  		   http://vinyl.quickweb.com/mark 

	 finger mark@quickweb.com for my PGP key and GCS code
----------------------------------------------------------------------------
	University degrees are a bit like adultery: you may not want to 
	get involved with that sort of thing, but you don't want to be 
	thought incapable.	-Sir Peter Imbert