Date: Tue, 25 Feb 1997 12:48:51 -0800 From: "Jin Guojun[ITG]" <jin@george.lbl.gov> To: asami@vader.cs.berkeley.edu, bde@zeta.org.au, mark@quickweb.com Cc: freebsd-hardware@freebsd.org, kuku@gilberto.physik.rwth-aachen.de, robsch@robkaos.ruhr.de Subject: Re: Memory speed of P6-200 (256k) Message-ID: <199702252048.MAA06713@george.lbl.gov>
next in thread | raw e-mail | index | archive | help
} >Do not waste time to play this game. The "dd" is O.S. dependent code.
}
} No, dd is very machine-independent. It just loops calling read() and
} write() with the specified block size. However, the implementation
} of /dev/zero is very machine-dependent. FreeBSD happens to have an
} implementation that copies memory in a straightforward way, so the speed
} reported by dd is closely related to the memory write bandwidth. The
} read bandwidth doesn't matter much because most reads are from the
} cache.
}
} >It does not give you what is real memory speed on your system. The result
} >from dd is really depended on the O.S. you are running. If you run 2.2 or
} >higher, you will get much better performance than 2.1.x.
}
} There isn't much difference unless you have a P5 and the P5-optimized
} copyout routine is not disabled.
}
} Bruce
% dmesg
FreeBSD 2.1.7-RELEASE #0: Thu Feb 20 20:44:03 PST 1997
root@adv-pc-1.lbl.gov:/usr/src/sys/compile/MinMax
CPU: 200-MHz Pentium 735\\90 or 815\\100 (Pentium-class CPU)
Origin = "GenuineIntel" Id = 0x52c Stepping=12
Features=0x1bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8>
real memory = 67108864 (65536K bytes)
avail memory = 63025152 (61548K bytes)
...
% dd if=/dev/zero of=/dev/null bs=1m count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 12 secs (87381333 bytes/sec)
0.0u 12.3s 0:12.43 99.4% 51+2809k 0+0io 3pf+0w
### this result matches the standard memory bandwidth
*** The same machine with different FreeBSD ***
% dmesg
Copyright (c) 1992-1996 FreeBSD Inc.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
FreeBSD 2.2-970215-GAMMA #0: Wed Feb 19 15:22:41 PST 1997
root@adv-pc-1.lbl.gov:/usr/src/sys/compile/MinMax
Calibrating clock(s) relative to mc146818A clock ... i586 clock: 200455533 Hz, i8254 clock: 1193190 Hz
CPU: Pentium (200.45-MHz 586-class CPU)
Origin = "GenuineIntel" Id = 0x52c Stepping=12
Features=0x1bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8>
real memory = 67108864 (65536K bytes)
avail memory = 62623744 (61156K bytes)
% dd if=/dev/zero of=/dev/null bs=1m count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 7.591523 secs (138124591 bytes/sec)
0.0u 7.5s 0:07.62 99.4% 71+2841k 0+0io 3pf+0w
# this result is better than the standard memory bandwidth, but worse than
the maximum memory bandwidth.
That is why I said "It is O.S. dependent."
} ----------------------------------------------------------------------
} * 440FX does have worse memory speed than Triton-{I,II}; even though P6 has
} * much better CPU speed, but the PCI controller (440FX) is worse.
}
} I know that. But there are some people seeing 80MB/s or more and some
} (including myself) who only get about 60MB/s on an apparently
} identical chipset.
}
} Satoshi
So, as we discussed before, you should notice that the result from "dd" does
not show what memory speed you really can get from your system, unless you
specifically use dd only.
The PCI is a 64-bit wide bus. The maxmimum memory speed you can get from
this bus is 1000000000 * 8 / 60ns = 133333333 Bytes/sec (no inteleave).
However, no memory sub-system in PC/UNIX O.S. uses 64-bit memory bandwidth
currently because of the CPU bus.
As I saw in the other message, you use FPU to achive this goal,
which is what you can see in ftp://george.lbl.gov/pub/ccs/performance.ps,
the result of 8-byte register to memory copy. This is what you really can get.
Another tip is even some motherboards are using same PCI shipset, the memory
performance may vary.
For example, compare ASUS Triton-{I, II} with three different Intel
motherboards (EV2, PT-2000, ZAPPA) with Triton-II PCI chipset,
the memory I/O speed does not have much different;
however, ASUS motherboard can use 70ns memory chip v.s. Intel motherboards have
to use 60ns memory chip. Very unbelieveable fact, right? This is hardware issue.
The software issue is that all memory sub-system (including string system)
are written in assemble language (NOT in C). The qulitiy of this piece of code
is critical to the memory performance. So, doing cross O.S. memory performance
comparssion is meaningless.
By understanding the memory system to determine the its speed is helpful.
So, to get 67 MBps from current PCI bus is normal. To get more than this
speed is feasible, depends on how to play the trick in the memory system.
-Jin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199702252048.MAA06713>
