Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Oct 2002 23:17:23 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Seigo Tanimura <tanimura@axe-inc.co.jp>
Cc:        Jeff Roberson <jroberson@chesapeake.net>, Seigo Tanimura <tanimura@axe-inc.co.jp>, Bruce Evans <bde@zeta.org.au>, current@FreeBSD.ORG, tanimura@FreeBSD.ORG
Subject:   Re: Dynamic growth of the buffer and buffer page reclaim
Message-ID:  <200210310717.g9V7HNff020614@apollo.backplane.com>
References:  <20021023163758.R22147-100000@mail.chesapeake.net> <200210280854.g9S8svSr094312@apollo.backplane.com> <200210310608.g9V68SoK022560@shojaku.t.axe-inc.co.jp>

next in thread | previous in thread | raw e-mail | index | archive | help
    Yes, this makes a lot of sense to me.   You are exercising the 
    system in a way that breaks the LRU algorithm.  The buffer cache,
    without your patch, is carefully tuned to deal with this case...
    that is why vm_page_dontneed() exists and why the vm_object code
    calls it.  This creates a little extra work when the buffer cache
    cycles, but prevents the system from reusing pages that it actually
    needs under certainly types of load.  In particular, the situation the
    system is saving itself from by making this call is the situation where
    a user is reading data file(s) sequentially which are far larger then
    can be reasonably cached.  In that situation strict LRU operation 
    would result in terrible performance due to the system attempting to
    unconditionally cache data it is going to have to throw away anyway,
    and soon, which displaces older cached data that it will actually need
    soon.   LRU isn't always the best policy.

    When you disable vm_page_dontneed() the huge amount of data you are
    moving through the system create a huge amount of pressure on the rest
    of the VM system, thus the slower performance when your data operations
    exceed what can be reasonably cached.  This would also have a severely
    detrimental effect on production systems running real loads.

    It's a tradeoff.  The system is trading off some cpu overhead
    generally in order to deal with a fairly common heavy-loading
    case and in order to reduce the pressure on the VM system for
    situations (such as reading a large file sequentially) which
    have no business putting pressure on the VM system.  e.g. the
    system is trying to avoid blowing away user B's cache when user A
    reads a huge file.  Your patch is changing the tradeoff, but not
    really making things better overall.  Sure, the buildworld test went
    faster, but that's just one type of load.

    I am somewhat surprised at your 32MB tests.  Are you sure you 
    stabilized the dd before getting those timings?  It would take
    more then one run of the dd on the file to completely cache it (that's
    one of the effects of vm_page_dontneed().  Since the system can't
    predict whether a large file is going to be re-read over and over
    again, or just read once, or even how much data will be read, it
    depresses the priority of pages statistically so it might take
    several full reads of the file for the system to realize that you
    really do want to cache the whole thing.  In anycase, 32MB dd's
    should be fully cached in the buffer cache, with no rewiring of
    pages occuring at all, so I'm not sure why your patch is faster
    for that case.  It shouldn't be.  Or the 64MB case.  The 96MB
    case is getting close to what your setup can cache reasonably.
    The pre-patch code can deal with it, but with your patch you are
    probably putting enough extra pressure on the VM system to force
    the pageout daemon to run earlier then it would without the patch.

    The VM system is a very finely tuned beast.  That isn't to say that
    it can't be improved, I'm sure it can, and I encourage you to play
    with it!  But you have to be wary of it as well.   The VM system is
    tuned primarily for performance under heavy loads.  There is a slight
    loss of performance under light loads because of the extra management.
    You have to be sure not to screw up the heavy-load performance when
    running light-load benchmarks.  A buildworld is a light load benchmark,
    primarily because it execs so programs so many times (the compiler)
    that there are a lot of free VM pages sitting around for it to use.
    Buildworlds do not load-test the VM system all that well!  A dd test
    is not supposed to load-test the VM system either.  This is why we have
    vm_page_dontneeds()'s.. user B's cache shouldn't be blown away just
    because user A is reading a large file.  We lose a little in a light
    load test but gain a lot under real world loads which put constant 
    pressure on the VM system.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

:I tried that on the same PC as my last benchmark.  The PC has 160MB
:RAM, so I created a file of 256MB.
:
:One pre-read (in order to stabilize the buffer cache) and four read
:tests were run consecutively for each of six distinct read sizes just
:after boot.  The average read times (in seconds) and speeds (in
:MB/sec) are shown below:
:
:
:		without my patch	with my patch
:read size	time	speed		time	speed
:32MB		.497	65.5		.471	69.0
:64MB		1.02	63.6		.901	72.1
:96MB		2.24	50.5		5.52	18.9
:128MB		20.7	6.19		16.5	7.79
:192MB		32.9	5.83		32.9	5.83
:256MB		42.5	6.02		43.0	5.95
:
:
:dillon>     Its case (1) that you are manipulating with your patch, and as you can
:dillon>     see it is entirely dependant on the number of wired pages that the 
:dillon>     system is able to maintain in the buffer cache.
:
:The results of 128MB-read are likely to be so.
:
:96MB-read gave interesting results.  Since vfs_unwirepages() passes
:buffer pages to vm_page_dontneed(), it seems that the page scanner
:reclaims buffer cache pages too aggressively.
:
:The table below shows the results with my patch where
:vfs_unwirepages() does not call vm_page_dontneed().
:
:
:read size	time	speed
:32MB		.503	63.7
:64MB		.916	70.5
:96MB		4.57	27.1
:128MB		17.0	7.62
:192MB		35.8	5.36
:256MB		46.0	5.56
:
:
:The 96MB-read results were a little bit better, although the reads of
:larger sizes became slower.  The unwired buffer pages may be putting
:a pressure on user process pages and the page scanner.
:
:-- 
:Seigo Tanimura <tanimura@axe-inc.co.jp>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200210310717.g9V7HNff020614>