Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Mar 1998 10:00:03 -0800 (PST)
From:      Hugh LaMaster <lamaster@george.arc.nasa.gov>
To:        freebsd-current@FreeBSD.ORG
Cc:        lamaster@george.arc.nasa.gov
Subject:   Re: Stream_d benchmark... Wow, there really are differences in
Message-ID:  <199803191800.KAA01635@george.arc.nasa.gov>
In-Reply-To: <199803191010.CAA21492@rah.star-gate.com> from "Amancio Hasty" at Mar 19, 98 02:10:43 am

next in thread | previous in thread | raw e-mail | index | archive | help

For the discussion of the chipsets, I refer to my previous post.  
This is just to put the numbers in one place so that they can be 
compared to previous numbers.  This discussion should probably be
elsewhere - in - hardware perhaps?


> This is an Asus motherboard dying to double or more my memory system.
> 
> My PPro200 is about a 1.5 years old and I hope that the new 100Mhz bus
> based systems fair better than my system.

> -------------------------------------------------------------
> Function      Rate (MB/s)   RMS time     Min time     Max time
> Copy:         113.7778       0.1557       0.1406       0.1719
> Scale:        107.7895       0.1565       0.1484       0.1719
> Add:          118.1538       0.2158       0.2031       0.2344
> Triad:        118.1538       0.2213       0.2031       0.2344

These numbers are quite good.  Hard to tell how much of the 
differences seen are due to board design, aggressive BIOS settings,
memory technology, and chipset.  And, perhaps, compilers, although
the compiler can't do anything about the poor PPro200/Natoma write
bandwidth.  These numbers seem high to me based on what I have read 
previously for generic EDO.  Anybody using BEDO out there?

> > Soeren Schmidt (sos@FreeBSD.org) wrote:
> > > In reply to Jaye Mathisen who wrote:
> > > 
> > > Hmm, Then I should be proud of my noname system (p6/200/128MB 72pEDO):
> > > 
> > > Function      Rate (MB/s)   RMS time     Min time     Max time
> > > Copy:         117.0286       0.2758       0.2734       0.2812
> > :
> > > Triad:        125.3878       0.3917       0.3828       0.4219

Higher yet for generic EDO.

> > > > All boxes are P6-200's, 256MB RAM (all RAM is 60ns FP as far as I know).
> > > > 
> > > > Box 1 is a SuperMicro P6DNE:
> > > > Function      Rate (MB/s)   RMS time     Min time     Max time
> > > > Copy:          60.7395       0.2704       0.2634       0.2832
> > > > Triad:         71.1647       0.3494       0.3372       0.3565

> > > > Box 2 is a Digital Prioris HX6000
> > > > Copy:          73.3551       0.2197       0.2181       0.2249
> > > > Triad:         77.4268       0.3108       0.3100       0.3122

> > > > Box 3 is a Digital Prioris ZX6000
> > > > Function      Rate (MB/s)   RMS time     Min time     Max time
> > > > Copy:          84.8807       0.2018       0.1885       0.2834
> > > > Scale:         97.5461       0.1661       0.1640       0.1720
> > > > Add:          111.6549       0.2179       0.2149       0.2247
> > > > Triad:        100.9468       0.2659       0.2377       0.4237

> > > > Box 3 uses 256bit interleaved memory, rather than whatever the
> > > > "standard" is.  


The web site for stream is http://www.cs.virginia.edu/stream
and down in ../standard/Bandwidth.html we see the following
for x86 boards tested.  Note that some people have complained
of the difficulty approaching Intel's "Alder" numbers, for the 
Orion chipset.  That board presumably had a very aggressive
memory design, and used Orion with full memory interleaving.
Various magazines have reported on what bandwidth the consumer 
actually gets in a typical system with typical software,
and the picture has usually been unpleasant.  So --

Interesting that some of the numbers above seem to almost
reach the Alder numbers using Natoma w/ EDO.  I admit I am
surprised.  Here are a few numbers, with the big systems 
for reference and entertainment, and the PC's at the bottom.
Note that the highest Intel board tested is a Dell PII_300;
unfortunately, chipset is not specified.  Note that the
way this benchmark counts bandwidth (in and out), a copy
shows twice the bandwidth that, e.g., the *rate* of bcopy()
would show.



All results are in MB/s --- 1 MB=10^6 B, *not* 2^20 B

------------------------------------------------------------------
Machine ID                ncpus    COPY    SCALE      ADD    TRIAD
------------------------------------------------------------------


[Big Iron - now that's memory bandwidth.  About 100X the
bandwidth per CPU of the PCs.  Too bad the CPUs are so
expensive.]

NEC_SX_4                    32 434784.0 432886.0 437358.0 436954.0
NEC_SX_4                     1  15983.0  15984.0  15989.0  15898.0
Cray_T932_321024-3E         32 310721.0 302182.0 359841.0 359270.0
Cray_T932_321024-3E          1  10653.0  10221.0  13014.0  13682.0
Cray_C90                     1   6965.4   6965.4   9378.7   9500.7



[Interesting workstation-server numbers, but, not all up to
date or the latest models.]

SGI_Origin_2000_2          128  21857.6  23351.7  24459.5  22913.6
SGI_Origin_2000_1           32   8556.0   8670.0   9733.0   9435.0
SGI_Origin_2000_1            1    296.0    300.0    315.0    317.0
IBM_RS6000-591               1    711.1    695.7    750.0    800.0
DEC_600au_600                1    227.7    223.0    243.5    248.2
Sun_Ultra2-2200              1    228.5    227.5    258.9    189.9
HP_C180                      1    262.3    262.3    244.9    242.4



[PC numbers, unfortunately without the chipset and memory 
technology info which would help sort this out.]

Compaq_Proliant_5000         1    123.1    114.3    141.2    126.3
Dell_P166s                   1    119.5    102.4    107.5    104.1
Dell_Pentium_133             1     88.0    125.7    132.0    120.0
Dell_486_DX-2-66             1     33.3     16.5     22.0     18.8
Dell_P6_200                  1    102.4    102.4    112.9    112.9
Dell_PII_300                 1    188.2    173.0    213.3    188.2
Gateway_2000_P6-200          1    107.9     89.5    100.5    101.6
Gateway_2000_P5-133-66       1     91.4    114.3    126.0    114.0
Intel_Alder_Pentium_Pr       1    140.0    140.0    163.9    167.6
Intel_Pentium-133            1     84.4     77.1     85.7     85.9
Intel_Pentium-100            1     85.1     74.4     77.0     75.2
Intel_Pentium-90             1     46.4     69.9     69.9     69.9
Intel_Pentium-60             1     37.2     62.1     61.3     58.5
PC-clone-AMD-486DX-50        1     38.1     26.2     28.6     23.3
PC-clone-AMD-486DX-80        1     83.9     41.9     39.3     39.3
Viglen_Pentium_60            1     47.1     61.5     63.1     60.0
Micron_P6-200                1     98.4     97.4    106.5    105.0
Micron_P5-120                1     79.3    100.4    109.9    107.7
Asus_Pentium_180             1     76.2    110.3    109.1    100.0
Asus_Pentium_200             1     84.2    123.1    123.1    111.6
Triton_II_Pentium_133        1     93.5    113.3    116.6    110.3
Triton_II_Pentium_133        1     75.9     85.3     87.8     85.3
Gigabyte_586HX               1     88.9    118.5    126.3    117.1




Note: These numbers don't tell the entire bandwidth story -
the cache hierarchy, latency, read and write bandwidth at 
each level, not to mention MP performance, cache-coherency, 
prefetch, multiple outstanding transactions, etc. etc. etc.  
are enough to write a (large) book about.

However, my experience is that many applications are sensitive
to bandwidth and it is worth a little effort to get the most
out the CPU.


--
 Hugh LaMaster, M/S 233-21,    ASCII Email: hlamaster@mail.arc.nasa.gov
 NASA Ames Research Center     Or:          lamaster@george.arc.nasa.gov
 Moffett Field, CA 94035-1000  No Junkmail: USC 18 section 2701
 Phone: 650/604-1056           Disclaimer:  Unofficial, personal *opinion*.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199803191800.KAA01635>