From owner-freebsd-hardware Tue Feb 25 12:53:58 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id MAA15375 for hardware-outgoing; Tue, 25 Feb 1997 12:53:58 -0800 (PST) Received: from george.lbl.gov (george.lbl.gov [128.3.196.93]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id MAA15364 for ; Tue, 25 Feb 1997 12:53:53 -0800 (PST) Received: (jin@localhost) by george.lbl.gov (8.6.10/8.6.5) id MAA06713; Tue, 25 Feb 1997 12:48:51 -0800 Date: Tue, 25 Feb 1997 12:48:51 -0800 From: "Jin Guojun[ITG]" Message-Id: <199702252048.MAA06713@george.lbl.gov> To: asami@vader.cs.berkeley.edu, bde@zeta.org.au, mark@quickweb.com Subject: Re: Memory speed of P6-200 (256k) Cc: freebsd-hardware@freebsd.org, kuku@gilberto.physik.rwth-aachen.de, robsch@robkaos.ruhr.de Sender: owner-hardware@freebsd.org X-Loop: FreeBSD.org Precedence: bulk } >Do not waste time to play this game. The "dd" is O.S. dependent code. } } No, dd is very machine-independent. It just loops calling read() and } write() with the specified block size. However, the implementation } of /dev/zero is very machine-dependent. FreeBSD happens to have an } implementation that copies memory in a straightforward way, so the speed } reported by dd is closely related to the memory write bandwidth. The } read bandwidth doesn't matter much because most reads are from the } cache. } } >It does not give you what is real memory speed on your system. The result } >from dd is really depended on the O.S. you are running. If you run 2.2 or } >higher, you will get much better performance than 2.1.x. } } There isn't much difference unless you have a P5 and the P5-optimized } copyout routine is not disabled. } } Bruce % dmesg FreeBSD 2.1.7-RELEASE #0: Thu Feb 20 20:44:03 PST 1997 root@adv-pc-1.lbl.gov:/usr/src/sys/compile/MinMax CPU: 200-MHz Pentium 735\\90 or 815\\100 (Pentium-class CPU) Origin = "GenuineIntel" Id = 0x52c Stepping=12 Features=0x1bf real memory = 67108864 (65536K bytes) avail memory = 63025152 (61548K bytes) ... % dd if=/dev/zero of=/dev/null bs=1m count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 12 secs (87381333 bytes/sec) 0.0u 12.3s 0:12.43 99.4% 51+2809k 0+0io 3pf+0w ### this result matches the standard memory bandwidth *** The same machine with different FreeBSD *** % dmesg Copyright (c) 1992-1996 FreeBSD Inc. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 2.2-970215-GAMMA #0: Wed Feb 19 15:22:41 PST 1997 root@adv-pc-1.lbl.gov:/usr/src/sys/compile/MinMax Calibrating clock(s) relative to mc146818A clock ... i586 clock: 200455533 Hz, i8254 clock: 1193190 Hz CPU: Pentium (200.45-MHz 586-class CPU) Origin = "GenuineIntel" Id = 0x52c Stepping=12 Features=0x1bf real memory = 67108864 (65536K bytes) avail memory = 62623744 (61156K bytes) % dd if=/dev/zero of=/dev/null bs=1m count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 7.591523 secs (138124591 bytes/sec) 0.0u 7.5s 0:07.62 99.4% 71+2841k 0+0io 3pf+0w # this result is better than the standard memory bandwidth, but worse than the maximum memory bandwidth. That is why I said "It is O.S. dependent." } ---------------------------------------------------------------------- } * 440FX does have worse memory speed than Triton-{I,II}; even though P6 has } * much better CPU speed, but the PCI controller (440FX) is worse. } } I know that. But there are some people seeing 80MB/s or more and some } (including myself) who only get about 60MB/s on an apparently } identical chipset. } } Satoshi So, as we discussed before, you should notice that the result from "dd" does not show what memory speed you really can get from your system, unless you specifically use dd only. The PCI is a 64-bit wide bus. The maxmimum memory speed you can get from this bus is 1000000000 * 8 / 60ns = 133333333 Bytes/sec (no inteleave). However, no memory sub-system in PC/UNIX O.S. uses 64-bit memory bandwidth currently because of the CPU bus. As I saw in the other message, you use FPU to achive this goal, which is what you can see in ftp://george.lbl.gov/pub/ccs/performance.ps, the result of 8-byte register to memory copy. This is what you really can get. Another tip is even some motherboards are using same PCI shipset, the memory performance may vary. For example, compare ASUS Triton-{I, II} with three different Intel motherboards (EV2, PT-2000, ZAPPA) with Triton-II PCI chipset, the memory I/O speed does not have much different; however, ASUS motherboard can use 70ns memory chip v.s. Intel motherboards have to use 60ns memory chip. Very unbelieveable fact, right? This is hardware issue. The software issue is that all memory sub-system (including string system) are written in assemble language (NOT in C). The qulitiy of this piece of code is critical to the memory performance. So, doing cross O.S. memory performance comparssion is meaningless. By understanding the memory system to determine the its speed is helpful. So, to get 67 MBps from current PCI bus is normal. To get more than this speed is feasible, depends on how to play the trick in the memory system. -Jin