Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 9 Nov 1997 18:59:42 -0500 (EST)
From:      "John S. Dyson" <toor@dyson.iquest.net>
To:        nate@mt.sri.com (Nate Williams)
Cc:        toor@dyson.iquest.net, nate@mt.sri.com, perlsta@cs.sunyit.edu, gpalmer@freebsd.org, freebsd-smp@freebsd.org
Subject:   Re: Best processor?
Message-ID:  <199711092359.SAA27521@dyson.iquest.net>
In-Reply-To: <199711092339.QAA06530@rocky.mt.sri.com> from Nate Williams at "Nov 9, 97 04:39:53 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
Nate Williams said:
> > > > dual 300mhz PIIs will beat dual PPro 200mhz
> > > 
> > > >From the noise I've been hearing lately on the mailing lists, this
> > > suprises me.  Do you have #'s to back it up?
> > > 
> 
> > I am not responding with numbers, but if you look at it, it is likely true:
> >
> > 1) The PIIs have 512K cache, while the PPro has (normally) 256K cache.
> 
> The big cache runs at half-speed though, which is a *huge* performance
> hit.  (They have a bigger L1 cache, which is a win, more on that below.)
> 
> > Therefore bus utilization is likely less with the PII.  Even in the
> > case of a 512K cache, the bus utilization is going to be nearly the
> > same.
> 
> Not quite.  The PII has to 'spin' alot more waiting for data since it
> can't get to it at bus-speeds, while the PPro doesn't have to.  Going
> from 256 -> 512K doesn't equal a double in cache performance (I'd
> suspect somewhere around 15-20% at best), so I would think the two #'s
> would be close to break-even.  If you get a 512K PPro it would be a big
> win.
>
Bus utilization doesn't have as much to do with the processor as what
the processor appears to be to the memory subsystem.  A 512K PPro should
have a bus utilization similar to a 512K PII.  Sure, the traffic between
the processor and 2nd level cache will be slower (due to the 1/2 speed)
and different (due to the double sized 1st level cache.)  That isn't what
I said though.

> 
> > In a DP system, bus utilization is likely less important than
> > in 4-way systems anyway.
> 
> DP?  Distributed Processing?  SMP?  Help me out here.
> 
Dual processor -- take a look at the comparison with a 4-way system.
>
> > 2) Expect about 3-5% miss rate with an 8K or 16K 1st level cache.  (I
> > have really measured it on real applications.)
> 
> Heck, let's use the #'s from Hennessy and Patterson, considered to be
> 'THE' hardware/cache reference in many folks minds.  (The processor in
> this case is one of the later VAX sets, but it's architecture is similar
> enough for cache performance to be pretty close).
> 
Sorry, but I measured it running real programs, like gcc, etc.  Note that
I seldom saw an 11% L1 miss rate (of course, you can make it miss using
synthetic benchmarks, but that is not what I am talking about.)  Try running
some tests with the P6 performance counters.  It doesn't measure the miss
rates, etc directly, but with a few documented calculations, you can get them.

I am going to be out of town for the rest of the week, but I have given others
a copy of the code.

> Cache/size vs. miss rate:
> 
> 8K:  8 - 16%
> 16K: 7 - 11%
> 32K: 2 - 6%
> 64K: 1 - 3%
> 
Interesting numbers.  They are too high for the workloads that I have
measured though.  What was the line size on those caches?  Maybe I'll
finally have to buy a copy of H&P to see what they are talking about.

>
> > Miss rate can be much lower than that though.  The miss rate does not
> > scale linearly downward with 1st level cache size, but it does go down
> > (especially with n-way associative cache schemes.)
> 
> This is for the L1 cache numbers, and the numbers given assume a data +
> instruction combined cache.
> 
A seperate cache for data + instruction has both advantages and
disadvantages.

> 
> I'd like to see real #'s to back that up. 
> 
Well, someone just posted a benchmark that showed that at least on PII was
faster than my PPro (I think that it was the semspeed benchmark.)  The results
and experiment were not controlled, but it was within the range that I would
expect.

>
> > If you are talking about 233MHz PII processors vs. 200MHz PPro processors, it
> > is harder to decide on which processor is faster, but I do think that the PII
> > will win out on average.
> 
> We're talking about SMP support, not UP support.  For UP stuff, there's
> no doubt that the high-clock PII chips will outperform a (relatively
> speaking) low-clock PPro chip, but for SMP everything I've read and seen
> tells me that the PPro *kills* the PII for SMP work, mostly due to the
> L2 cache (and motherboard design??)
> 
Even with SDRAM and an LX chipset?  A 512K PII shouldn't look that different
to a bus subsystem from a 512K P6, unless someone made a very bad mistake.
Maybe they broke the MESI protocol?

Now, the only thing that I can believe to be fact at this point:

	a 4-WAY P6 beats all PII configs.
	a P6 (per MHz) is mostly faster than a PII.
	a P6 system is likely faster than a PII system, if you need
		more than 512MB.
	a P6-233 is seldom slower than a PII-233.

I don't think that any claims can be made that in general:

	a dual P6-200 Natoma sys is faster than a dual PII-233 LX system.

It would be nice to see some numbers that say that a 2XP6-200 Natoma is faster
in the same hardware configuration (incl diskdrives/memory) than a 2XPII-233 LX.
I don't think that it will be so though.  A good, fast test would be a recompile
of GCC from the FSF sources.  (That is faster than worldstone.)

If we find out that a dual PII/233 or PII/266 is slower than a dual P6-512K,
that would be VERY INTERESTING!!!  Anyone willing to take up the challenge?


-- 
John
dyson@freebsd.org
jdyson@nc.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199711092359.SAA27521>