Date: Thu, 23 Mar 2006 15:16:11 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Mikhail Teterin <mi+mx@aldan.algebra.com> Cc: alc@freebsd.org, stable@freebsd.org Subject: Re: Reading via mmap stinks (Re: weird bugs with mmap-ing via NFS) Message-ID: <200603232316.k2NNGBka068754@apollo.backplane.com> References: <200603211607.30372.mi%2Bmx@aldan.algebra.com> <200603231403.36136.mi%2Bmx@aldan.algebra.com> <200603232048.k2NKm4QL067644@apollo.backplane.com> <200603231626.19102.mi%2Bmx@aldan.algebra.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:Yes, they both do work fine, but time gives very different stats for each. In :my experiments, the total CPU time is noticably less with mmap, but the :elapsed time is (much) greater. Here are results from FreeBSD-6.1/amd64 -- :notice the large number of page faults, because the system does not try to :preload file in the mmap case as it does in the read case: : : time fgrep meowmeowmeow /home/oh.0.dump : 2.167u 7.739s 1:25.21 11.6% 70+3701k 23663+0io 6pf+0w : time fgrep --mmap meowmeowmeow /home/oh.0.dump : 1.552u 7.109s 2:46.03 5.2% 18+1031k 156+0io 106327pf+0w : :Use a big enough file to bust the memory caching (oh.0.dump above is 2.9Gb), :I'm sure, you will have no problems reproducing this result. 106,000 page faults. How many pages is a 2.9GB file? If this is running in 64-bit mode those would be 8K pages, right? So that would come to around 380,000 pages. About 1:4. So, clearly the operating system *IS* pre-faulting multiple pages. Since I don't believe that a memory fault would be so inefficient as to account for 80 seconds of run time, it seems more likely to me that the problem is that the VM system is not issuing read-aheads. Not issuing read-aheads would easily account for the 80 seconds. It is possible that the kernel believes the VM system to be too loaded to issue read-aheads, as a consequence of your blowing out of the system caches. It is also possible that the read-ahead code is broken in FreeBSD. To determine which of the two is more likely, you have to run a smaller data set (like 600MB of data on a system with 1GB of ram), and use the unmount/mount trick to clear the cache before each grep test. If the time differential is still huge using the unmount/mount data set test as described above, then the VM system's read-ahead code is broken. If the time differential is tiny, however, then it's probably nothing more then the kernel interpreting your massive 2.9GB mmap as being too stressful on the VM system and disabling read-aheads for that reason. In anycase, this sort of test is not really a good poster child for how to use mmap(). Nobody in their right mind uses mmap() on datasets that they expect to be uncacheable and which are accessed sequentially. It's just plain silly to use mmap() in that sort of circumstance. This is a trueism on ANY operating system, not just FreeBSD. The uncached data set test (using unmount/mount and a dataset which fits into memory) is a far more realistic test because it simulates the most common case encountered by a system under load... the accessing of a reasonably sized data set which happens to not be in the cache. -Matt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200603232316.k2NNGBka068754>