From owner-freebsd-stable@FreeBSD.ORG  Sat Mar 25 18:29:36 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: stable@freebsd.org
Delivered-To: freebsd-stable@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 98C8216A41F;
	Sat, 25 Mar 2006 18:29:36 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 31C8043D45;
	Sat, 25 Mar 2006 18:29:36 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.13.4.20060308/8.13.4) with ESMTP id
	k2PITM5w014735; Sat, 25 Mar 2006 10:29:22 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.13.4.20060308/8.13.4/Submit) id
	k2PITH5D014732; Sat, 25 Mar 2006 10:29:17 -0800 (PST)
Date: Sat, 25 Mar 2006 10:29:17 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200603251829.k2PITH5D014732@apollo.backplane.com>
To: Peter Jeremy <peterjeremy@optushome.com.au>
References: <200603211607.30372.mi+mx@aldan.algebra.com>
	<200603231403.36136.mi+mx@aldan.algebra.com>
	<200603232048.k2NKm4QL067644@apollo.backplane.com>
	<200603231626.19102.mi+mx@aldan.algebra.com>
	<200603232316.k2NNGBka068754@apollo.backplane.com>
	<20060324084940.GA703@turion.vk2pj.dyndns.org>
	<200603241800.k2OI0KF8005579@apollo.backplane.com>
	<20060325094207.GD703@turion.vk2pj.dyndns.org>
Cc: alc@freebsd.org, Mikhail Teterin <mi+mx@aldan.algebra.com>,
	stable@freebsd.org
Subject: Re: Reading via mmap stinks (Re: weird bugs with mmap-ing via NFS)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Mar 2006 18:29:36 -0000


:The results here are weird.  With 1GB RAM and a 2GB dataset, the
:timings seem to depend on the sequence of operations: reading is
:significantly faster, but only when the data was mmap'd previously
:There's one outlier that I can't easily explain.
:...
:Peter Jeremy

    Really odd.  Note that if your disk can only do 25 MBytes/sec, the
    calculation is: 2052167894 / 25MB = ~80 seconds, not ~60 seconds 
    as you would expect from your numbers.

    So that would imply that the 80 second numbers represent read-ahead,
    and the 60 second numbers indicate that some of the data was retained
    from a prior run (and not blown out by the sequential reading in the
    later run).

    This type of situation *IS* possible as a side effect of other
    heuristics.  It is particularly possible when you combine read() with
    mmap because read() uses a different heuristic then mmap() to
    implement the read-ahead.  There is also code in there which depresses
    the page priority of 'old' already-read pages in the sequential case.
    So, for example, if you do a linear grep of 2GB you might end up with
    a cache state that looks like this:

    l = low priority page
    m = medium priority page
    h = high priority page

    FILE: [---------------------------mmmmmmmmmmmmm]

    Then when you rescan using mmap,

    FILE: [lllllllll------------------mmmmmmmmmmmmm]
          [------lllllllll------------mmmmmmmmmmmmm]
          [---------lllllllll---------mmmmmmmmmmmmm]
          [------------lllllllll------mmmmmmmmmmmmm]
          [---------------lllllllll---mmmmmmmmmmmmm]
          [------------------lllllllllmmmmmmmmmmmmm]
          [---------------------llllllHHHmmmmmmmmmm]
          [------------------------lllHHHHHHmmmmmmm]
          [---------------------------HHHHHHHHHmmmm]
          [---------------------------mmmHHHHHHHHHm]

    The low priority pages don't bump out the medium priority pages
    from the previous scan, so the grep winds up doing read-ahead
    until it hits the large swath of pages already cached from the
    previous scan, without bumping out those pages.

    There is also a heuristic in the system (FreeBSD and DragonFly)
    which tries to randomly retain pages.  It clearly isn't working :-)
    I need to change it to randomly retain swaths of pages, the
    idea being that it should take repeated runs to rebalance the VM cache
    rather then allowing a single run to blow it out or allowing a 
    static set of pages to be retained indefinitely, which is what your
    tests seem to show is occuring.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>