Date: Wed, 23 Jun 2004 02:41:17 -0400 From: Mikhail Teterin <mi+kde@aldan.algebra.com> To: Peter Wemm <peter@wemm.org> Cc: Julian Elischer <julian@elischer.org> Subject: Re: read vs. mmap (or io vs. page faults) Message-ID: <200406230241.18132@aldan> In-Reply-To: <200406222027.30702.peter@wemm.org> References: <Pine.BSF.4.21.0406201716191.23541-100000@InterJet.elischer.org> <200406220108.31366@aldan> <200406222027.30702.peter@wemm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 22 June 2004 11:27 pm, Peter Wemm wrote: = mmap is more valuable as a programmer convenience these days. Don't = make the mistake of assuming its faster, especially since the cost of = a copy has gone way down. Actually, let me back off from agreeing with you here :-) On io-bound machines (such as my laptop), there is no discernable difference in either the CPU or the elapsed time -- md5-ing a file with mmap or read is (curiously) slightly faster than just cat-ing it into /dev/null. On an dual P2 450MHz, the single process always wins the CPU time and sometimes the elapsed time. Sometimes it wins handsomly: mmap: 35.271u 4.004s 1:06.08 59.4% 10+190k 0+0io 4185pf+0w read: 32.134u 15.797s 1:58.72 40.3% 408+302k 11228+0io 12pf+0w or mmap: 35.039u 4.558s 1:10.27 56.3% 10+190k 5+0io 5028pf+0w read: 29.931u 27.848s 2:07.17 45.4% 10+187k 11219+0io 5pf+0w Mind you, both of the two processors are Xeons with _2Mb of cache on each_, so memory copying should be even cheaper on them than usual. And yet mmap manages to win... On a single P2 400MHz (standard 521Kb cache) mmap always wins the CPU time, and, thanks to that, can win the elapsed time on a busy system. For example, running two of these processes in parallel (on two separate copies of the same huge file residing on distinct disks) yields (same 1462726660-byte file as in the dual Xeon stats above): mmap: 66.989u 7.584s 3:01.76 41.0% 5+238k 90+0io 22456pf+0w 65.474u 7.729s 2:38.59 46.1% 5+241k 90+0io 22401pf+0w read: 60.724u 42.394s 3:37.01 47.5% 5+241k 22541+0io 0pf+0w 61.778u 41.987s 3:35.36 48.1% 5+239k 11256+0io 0pf+0w That's 182 vs. 215 seconds, or 15% elapsed time win for mmap. Evidently, mmap runs through that "nasty nasty code" faster than read runs through its. mmap loses on an idle system, I presume, because page-faulting is not smart enough to page-fault ahead as efficiently as read pre-reads ahead. Why am I complaining then? Because I want the "nasty nasty code" improved so that using mmap is beneficial for the single process too. Thank you very much! Yours, -mi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200406230241.18132>