From owner-freebsd-hackers Wed Aug 6 19:41:54 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id TAA11232 for hackers-outgoing; Wed, 6 Aug 1997 19:41:54 -0700 (PDT) Received: from vinyl.quickweb.com (vinyl.quickweb.com [206.222.77.8]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id TAA11227 for ; Wed, 6 Aug 1997 19:41:51 -0700 (PDT) Received: (from mark@localhost) by vinyl.quickweb.com (8.8.5/8.6.12) id WAA15602; Wed, 6 Aug 1997 22:37:48 -0400 (EDT) Message-ID: <19970806223748.63083@vinyl.quickweb.com> Date: Wed, 6 Aug 1997 22:37:48 -0400 From: Mark Mayo To: Tony Overfield Cc: Curt Sampson , hackers@FreeBSD.ORG Subject: Re: Pentium II? References: <3.0.2.32.19970803041915.006a69e4@bugs.us.dell.com> <3.0.2.32.19970806043249.006df3e4@bugs.us.dell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.81e In-Reply-To: <3.0.2.32.19970806043249.006df3e4@bugs.us.dell.com>; from Tony Overfield on Wed, Aug 06, 1997 at 04:32:49AM -0500 Sender: owner-freebsd-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Just one question: why are you comparing the PPro with 256K cache instead of the PPro 200/512 ?? Everybody I know that builds high end PC server (an oxymoron I know..) used the 512KB version of the PPro... And most run it at 233MHz... -Mark On Wed, Aug 06, 1997 at 04:32:49AM -0500, Tony Overfield wrote: > At 03:17 AM 8/3/97 -0700, Curt Sampson wrote: > >I wasn't interested in what you think as much as which particular > >benchmarks indicate this. Feel free to provide references. > > I claimed that a larger L1 cache makes the processor faster, which > at least partially offsets the effect of the slower L2 cache. > This is ordinarily a self-evident truth which needs no references. > I have no desire to search for references at the behest of skeptics. > > >> It should be easy to agree that larger L1 caches have higher hit rates. > > > >Sure. But the L2 cache in the PPro is running at the same speed as > >the L1 cache in the PPro and the PII. Thus, I don't think that > >having twice the L1 cache is going to make a lot of difference. > >Feel free to show me the actual benchmarks that prove me wrong. > > > >cjs > > > >Curt Sampson cjs@portal.ca Info at http://www.portal.ca/ > >Internet Portal Services, Inc. `And malt does more than Milton can > >Vancouver, BC (604) 257-9400 To justify God's ways to man.' > > You're wrong. The L1 cache in the PPro is faster than its L2 cache. > > Since the size of the L1 cache can't be adjusted on PPro processors, > it's not easy to find a ready-made benchmark that proves that a larger > L1 cache is beneficial. One way that this can be shown is to compare > the Pentium processors to the Pentium w/ MMX processors. In > comparisons between these, the MMX is invariably faster, due (for > non-MMX benchmarks) entirely to the larger L1 cache. However, as > you said, this only helps if the L2 cache is slower than the L1 > cache. But *that* can be easily proven. > > The performance of a cache depends on more than the clock speed at > which it runs. The L1 cache in the PPro and PII is split between an > instruction cache and a dual-ported data cache. Thus, the L1 cache can > transfer up to three sets of data per cycle. This means the processor > can simultaneously read code from the code cache, read data from the > data cache, and write data to the data cache. > > The L2 cache, on the other hand, is a unified instruction and data cache > with a 64 bit data bus. This L2 cache is much improved over the Pentium > (P5) architecture because it has a dedicated bus. The dedicated L2 cache > bus prevents L2 cache accesses from competing for bandwidth with the > external CPU data bus, which may be busy with ordinary CPU traffic, > traffic from PCI master cycles and traffic from other processors. > > Even though the built-in L2 cache is very fast, it is not as fast as > the more tightly integrated L1 cache. > > Some benchmark data is included below. > > First the benchmark pseudocode: > (If you want the DOS x86 assembly source code, ask me.) > > loop (for a variety of sizes) > { > wbinvd (empty the L1 and L2 caches) > rep movsd (move, in place, the test memory) > rtsc (read time stamp counter -> start time) > rep movsd (move, in place, the test memory) > rtsc (read time stamp counter -> end time) > } > > This simple little benchmark shows: > > 1. The PPro L1 data cache is 8KB. > 2. The PII L1 data cache is 16KB. > 3. The PII L2 cache is half-speed with respect to the PPro. > 4. My PPro's L2 cache is 256KB. > 5. My PII's L2 cache is 512KB. > 6. DRAM is much slower than the L2 cache (of course). > 7. The PPro's L2 cache is about two times slower than its L1 cache. > 8. The PII's L2 cache is about 4 or 5 times slower than its L1 cache. > > The results: > > PPro 200/256 > > Moving 2KB - Clocks: 0x0000023A Clocks/KB moved: 285 > Moving 4KB - Clocks: 0x000003B9 Clocks/KB moved: 238 > Moving 8KB - Clocks: 0x000006C9 Clocks/KB moved: 217 > Moving 12KB - Clocks: 0x0000186E Clocks/KB moved: 521 > Moving 16KB - Clocks: 0x0000206E Clocks/KB moved: 518 > Moving 24KB - Clocks: 0x0000306E Clocks/KB moved: 516 > Moving 32KB - Clocks: 0x0000406E Clocks/KB moved: 515 > Moving 64KB - Clocks: 0x0000806E Clocks/KB moved: 513 > Moving 128KB - Clocks: 0x0001006E Clocks/KB moved: 512 > Moving 256KB - Clocks: 0x00020127 Clocks/KB moved: 513 > Moving 384KB - Clocks: 0x000DBA60 Clocks/KB moved: 2342 > Moving 512KB - Clocks: 0x00124D1D Clocks/KB moved: 2342 > Moving 768KB - Clocks: 0x001B72BE Clocks/KB moved: 2342 > Moving 1024KB - Clocks: 0x00249796 Clocks/KB moved: 2341 > > PII 233/512 > > Moving 2KB - Clocks: 0x0000023A Clocks/KB moved: 285 > Moving 4KB - Clocks: 0x000003BA Clocks/KB moved: 238 > Moving 8KB - Clocks: 0x000006BA Clocks/KB moved: 215 > Moving 12KB - Clocks: 0x000009BA Clocks/KB moved: 207 > Moving 16KB - Clocks: 0x00000CF8 Clocks/KB moved: 207 > Moving 24KB - Clocks: 0x0000661B Clocks/KB moved: 1089 > Moving 32KB - Clocks: 0x0000881E Clocks/KB moved: 1088 > Moving 64KB - Clocks: 0x0001101E Clocks/KB moved: 1088 > Moving 128KB - Clocks: 0x00022024 Clocks/KB moved: 1088 > Moving 256KB - Clocks: 0x0004401A Clocks/KB moved: 1088 > Moving 384KB - Clocks: 0x00066029 Clocks/KB moved: 1088 > Moving 512KB - Clocks: 0x000880C6 Clocks/KB moved: 1088 > Moving 768KB - Clocks: 0x0016A600 Clocks/KB moved: 1932 > Moving 1024KB - Clocks: 0x0026AC66 Clocks/KB moved: 2475 > > Clocks are measured in actual CPU clocks, so these numbers > don't change much when the clock speed is changed, except > for those which are affected by DRAM accesses, since DRAM > speed doesn't scale with CPU speed. > > - > Tony > -- ---------------------------------------------------------------------------- Mark Mayo mark@quickweb.com RingZero Comp. http://vinyl.quickweb.com/mark finger mark@quickweb.com for my PGP key and GCS code ---------------------------------------------------------------------------- University degrees are a bit like adultery: you may not want to get involved with that sort of thing, but you don't want to be thought incapable. -Sir Peter Imbert