From owner-freebsd-hackers Wed Aug 6 02:34:10 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id CAA11401 for hackers-outgoing; Wed, 6 Aug 1997 02:34:10 -0700 (PDT) Received: from bugs.us.dell.com (bugs.us.dell.com [143.166.169.147]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id CAA11395 for ; Wed, 6 Aug 1997 02:34:07 -0700 (PDT) Received: from moth.us.dell.com (moth.us.dell.com [143.166.169.152]) by bugs.us.dell.com (8.6.12/8.6.12) with SMTP id EAA11966; Wed, 6 Aug 1997 04:33:05 -0500 Message-Id: <3.0.2.32.19970806043249.006df3e4@bugs.us.dell.com> X-Sender: tony@bugs.us.dell.com X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.2 (32) Date: Wed, 06 Aug 1997 04:32:49 -0500 To: Curt Sampson From: Tony Overfield Subject: Re: Pentium II? Cc: hackers@FreeBSD.ORG In-Reply-To: References: <3.0.2.32.19970803041915.006a69e4@bugs.us.dell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-freebsd-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk At 03:17 AM 8/3/97 -0700, Curt Sampson wrote: >I wasn't interested in what you think as much as which particular >benchmarks indicate this. Feel free to provide references. I claimed that a larger L1 cache makes the processor faster, which at least partially offsets the effect of the slower L2 cache. This is ordinarily a self-evident truth which needs no references. I have no desire to search for references at the behest of skeptics. >> It should be easy to agree that larger L1 caches have higher hit rates. > >Sure. But the L2 cache in the PPro is running at the same speed as >the L1 cache in the PPro and the PII. Thus, I don't think that >having twice the L1 cache is going to make a lot of difference. >Feel free to show me the actual benchmarks that prove me wrong. > >cjs > >Curt Sampson cjs@portal.ca Info at http://www.portal.ca/ >Internet Portal Services, Inc. `And malt does more than Milton can >Vancouver, BC (604) 257-9400 To justify God's ways to man.' You're wrong. The L1 cache in the PPro is faster than its L2 cache. Since the size of the L1 cache can't be adjusted on PPro processors, it's not easy to find a ready-made benchmark that proves that a larger L1 cache is beneficial. One way that this can be shown is to compare the Pentium processors to the Pentium w/ MMX processors. In comparisons between these, the MMX is invariably faster, due (for non-MMX benchmarks) entirely to the larger L1 cache. However, as you said, this only helps if the L2 cache is slower than the L1 cache. But *that* can be easily proven. The performance of a cache depends on more than the clock speed at which it runs. The L1 cache in the PPro and PII is split between an instruction cache and a dual-ported data cache. Thus, the L1 cache can transfer up to three sets of data per cycle. This means the processor can simultaneously read code from the code cache, read data from the data cache, and write data to the data cache. The L2 cache, on the other hand, is a unified instruction and data cache with a 64 bit data bus. This L2 cache is much improved over the Pentium (P5) architecture because it has a dedicated bus. The dedicated L2 cache bus prevents L2 cache accesses from competing for bandwidth with the external CPU data bus, which may be busy with ordinary CPU traffic, traffic from PCI master cycles and traffic from other processors. Even though the built-in L2 cache is very fast, it is not as fast as the more tightly integrated L1 cache. Some benchmark data is included below. First the benchmark pseudocode: (If you want the DOS x86 assembly source code, ask me.) loop (for a variety of sizes) { wbinvd (empty the L1 and L2 caches) rep movsd (move, in place, the test memory) rtsc (read time stamp counter -> start time) rep movsd (move, in place, the test memory) rtsc (read time stamp counter -> end time) } This simple little benchmark shows: 1. The PPro L1 data cache is 8KB. 2. The PII L1 data cache is 16KB. 3. The PII L2 cache is half-speed with respect to the PPro. 4. My PPro's L2 cache is 256KB. 5. My PII's L2 cache is 512KB. 6. DRAM is much slower than the L2 cache (of course). 7. The PPro's L2 cache is about two times slower than its L1 cache. 8. The PII's L2 cache is about 4 or 5 times slower than its L1 cache. The results: PPro 200/256 Moving 2KB - Clocks: 0x0000023A Clocks/KB moved: 285 Moving 4KB - Clocks: 0x000003B9 Clocks/KB moved: 238 Moving 8KB - Clocks: 0x000006C9 Clocks/KB moved: 217 Moving 12KB - Clocks: 0x0000186E Clocks/KB moved: 521 Moving 16KB - Clocks: 0x0000206E Clocks/KB moved: 518 Moving 24KB - Clocks: 0x0000306E Clocks/KB moved: 516 Moving 32KB - Clocks: 0x0000406E Clocks/KB moved: 515 Moving 64KB - Clocks: 0x0000806E Clocks/KB moved: 513 Moving 128KB - Clocks: 0x0001006E Clocks/KB moved: 512 Moving 256KB - Clocks: 0x00020127 Clocks/KB moved: 513 Moving 384KB - Clocks: 0x000DBA60 Clocks/KB moved: 2342 Moving 512KB - Clocks: 0x00124D1D Clocks/KB moved: 2342 Moving 768KB - Clocks: 0x001B72BE Clocks/KB moved: 2342 Moving 1024KB - Clocks: 0x00249796 Clocks/KB moved: 2341 PII 233/512 Moving 2KB - Clocks: 0x0000023A Clocks/KB moved: 285 Moving 4KB - Clocks: 0x000003BA Clocks/KB moved: 238 Moving 8KB - Clocks: 0x000006BA Clocks/KB moved: 215 Moving 12KB - Clocks: 0x000009BA Clocks/KB moved: 207 Moving 16KB - Clocks: 0x00000CF8 Clocks/KB moved: 207 Moving 24KB - Clocks: 0x0000661B Clocks/KB moved: 1089 Moving 32KB - Clocks: 0x0000881E Clocks/KB moved: 1088 Moving 64KB - Clocks: 0x0001101E Clocks/KB moved: 1088 Moving 128KB - Clocks: 0x00022024 Clocks/KB moved: 1088 Moving 256KB - Clocks: 0x0004401A Clocks/KB moved: 1088 Moving 384KB - Clocks: 0x00066029 Clocks/KB moved: 1088 Moving 512KB - Clocks: 0x000880C6 Clocks/KB moved: 1088 Moving 768KB - Clocks: 0x0016A600 Clocks/KB moved: 1932 Moving 1024KB - Clocks: 0x0026AC66 Clocks/KB moved: 2475 Clocks are measured in actual CPU clocks, so these numbers don't change much when the clock speed is changed, except for those which are affected by DRAM accesses, since DRAM speed doesn't scale with CPU speed. - Tony