Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Jun 2004 02:41:17 -0400
From:      Mikhail Teterin <mi+kde@aldan.algebra.com>
To:        Peter Wemm <peter@wemm.org>
Cc:        Julian Elischer <julian@elischer.org>
Subject:   Re: read vs. mmap (or io vs. page faults)
Message-ID:  <200406230241.18132@aldan>
In-Reply-To: <200406222027.30702.peter@wemm.org>
References:  <Pine.BSF.4.21.0406201716191.23541-100000@InterJet.elischer.org> <200406220108.31366@aldan> <200406222027.30702.peter@wemm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 22 June 2004 11:27 pm, Peter Wemm wrote:

= mmap is more valuable as a programmer convenience these days. Don't
= make the mistake of assuming its faster, especially since the cost of
= a copy has gone way down.

Actually, let me back off from agreeing with you here :-) On io-bound
machines (such as my laptop), there is no discernable difference in
either the CPU or the elapsed time -- md5-ing a file with mmap or read
is (curiously) slightly faster than just cat-ing it into /dev/null.

On an dual P2 450MHz, the single process always wins the CPU time and
sometimes the elapsed time. Sometimes it wins handsomly:

	mmap: 35.271u 4.004s 1:06.08 59.4%   10+190k 0+0io 4185pf+0w
	read: 32.134u 15.797s 1:58.72 40.3%  408+302k 11228+0io 12pf+0w

or

	mmap: 35.039u 4.558s 1:10.27 56.3%    10+190k 5+0io 5028pf+0w
	read: 29.931u 27.848s 2:07.17 45.4%   10+187k 11219+0io 5pf+0w

Mind you, both of the two processors are Xeons with _2Mb of cache on
each_, so memory copying should be even cheaper on them than usual. And
yet mmap manages to win...

On a single P2 400MHz (standard 521Kb cache) mmap always wins the CPU
time, and, thanks to that, can win the elapsed time on a busy system.
For example, running two of these processes in parallel (on two separate
copies of the same huge file residing on distinct disks) yields (same
1462726660-byte file as in the dual Xeon stats above):

	mmap: 66.989u 7.584s 3:01.76 41.0%    5+238k 90+0io 22456pf+0w
	      65.474u 7.729s 2:38.59 46.1%    5+241k 90+0io 22401pf+0w
	read: 60.724u 42.394s 3:37.01 47.5%   5+241k 22541+0io 0pf+0w
	      61.778u 41.987s 3:35.36 48.1%   5+239k 11256+0io 0pf+0w

That's 182 vs. 215 seconds, or 15% elapsed time win for mmap. Evidently,
mmap runs through that "nasty nasty code" faster than read runs through
its. mmap loses on an idle system, I presume, because page-faulting is
not smart enough to page-fault ahead as efficiently as read pre-reads
ahead.

Why am I complaining then? Because I want the "nasty nasty code"
improved so that using mmap is beneficial for the single process too.

Thank you very much! Yours,

	-mi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200406230241.18132>