From owner-freebsd-stable@FreeBSD.ORG Sat Mar 25 20:14:01 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 335FC16A401 for ; Sat, 25 Mar 2006 20:14:01 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from mail27.syd.optusnet.com.au (mail27.syd.optusnet.com.au [211.29.133.168]) by mx1.FreeBSD.org (Postfix) with ESMTP id 713A043D45 for ; Sat, 25 Mar 2006 20:14:00 +0000 (GMT) (envelope-from peterjeremy@optushome.com.au) Received: from turion.vk2pj.dyndns.org (c220-239-19-236.belrs4.nsw.optusnet.com.au [220.239.19.236]) by mail27.syd.optusnet.com.au (8.12.11/8.12.11) with ESMTP id k2PKDt7m027328 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 26 Mar 2006 07:13:56 +1100 Received: from turion.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by turion.vk2pj.dyndns.org (8.13.4/8.13.4) with ESMTP id k2PKDsXE007659; Sun, 26 Mar 2006 07:13:55 +1100 (EST) (envelope-from peter@turion.vk2pj.dyndns.org) Received: (from peter@localhost) by turion.vk2pj.dyndns.org (8.13.4/8.13.4/Submit) id k2PKDrcS007658; Sun, 26 Mar 2006 07:13:53 +1100 (EST) (envelope-from peter) Date: Sun, 26 Mar 2006 07:13:52 +1100 From: Peter Jeremy To: Matthew Dillon Message-ID: <20060325201351.GH703@turion.vk2pj.dyndns.org> References: <200603211607.30372.mi+mx@aldan.algebra.com> <200603231403.36136.mi+mx@aldan.algebra.com> <200603232048.k2NKm4QL067644@apollo.backplane.com> <200603231626.19102.mi+mx@aldan.algebra.com> <200603232316.k2NNGBka068754@apollo.backplane.com> <20060324084940.GA703@turion.vk2pj.dyndns.org> <200603241800.k2OI0KF8005579@apollo.backplane.com> <20060325094207.GD703@turion.vk2pj.dyndns.org> <200603251829.k2PITH5D014732@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200603251829.k2PITH5D014732@apollo.backplane.com> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.11 Cc: stable@freebsd.org Subject: Re: Reading via mmap stinks (Re: weird bugs with mmap-ing via NFS) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Mar 2006 20:14:01 -0000 On Sat, 2006-Mar-25 10:29:17 -0800, Matthew Dillon wrote: > Really odd. Note that if your disk can only do 25 MBytes/sec, the > calculation is: 2052167894 / 25MB = ~80 seconds, not ~60 seconds > as you would expect from your numbers. systat was reporting 25-26 MB/sec. dd'ing the underlying partition gives 27MB/sec (with 24 and 28 for adjacent partions). > This type of situation *IS* possible as a side effect of other > heuristics. It is particularly possible when you combine read() with > mmap because read() uses a different heuristic then mmap() to > implement the read-ahead. There is also code in there which depresses > the page priority of 'old' already-read pages in the sequential case. > So, for example, if you do a linear grep of 2GB you might end up with > a cache state that looks like this: If I've understood you correctly, this also implies that the timing depends on the previous two scans, not just the previous scan. I didn't test all combinations of this but would have expected to see two distinct sets of mmap/read timings - one for read/mmap/read and one for mmap/mmap/read. > I need to change it to randomly retain swaths of pages, the > idea being that it should take repeated runs to rebalance the VM cache > rather then allowing a single run to blow it out or allowing a > static set of pages to be retained indefinitely, which is what your > tests seem to show is occuring. I dont think this sort of test is a clear indication that something is wrong. There's only one active process at any time and it's performing a sequential read of a large dataset. In this case, evicting already cached data to read new data is not necessarily productive (a simple- minded algorithm will be evicting data this is going to be accessed in the near future). Based on the timings, mmap/read case manages to retain ~15% of the file in cache. Given the amount of RAM available, the theoretical limit is about 40% so this isn't too bad. It would be nicer if both read and mmap managed this gain, irrespective of how the data had been previously accessed. -- Peter Jeremy