Date: Wed, 30 Oct 2002 23:17:23 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Seigo Tanimura <tanimura@axe-inc.co.jp> Cc: Jeff Roberson <jroberson@chesapeake.net>, Seigo Tanimura <tanimura@axe-inc.co.jp>, Bruce Evans <bde@zeta.org.au>, current@FreeBSD.ORG, tanimura@FreeBSD.ORG Subject: Re: Dynamic growth of the buffer and buffer page reclaim Message-ID: <200210310717.g9V7HNff020614@apollo.backplane.com> References: <20021023163758.R22147-100000@mail.chesapeake.net> <200210280854.g9S8svSr094312@apollo.backplane.com> <200210310608.g9V68SoK022560@shojaku.t.axe-inc.co.jp>
next in thread | previous in thread | raw e-mail | index | archive | help
Yes, this makes a lot of sense to me. You are exercising the
system in a way that breaks the LRU algorithm. The buffer cache,
without your patch, is carefully tuned to deal with this case...
that is why vm_page_dontneed() exists and why the vm_object code
calls it. This creates a little extra work when the buffer cache
cycles, but prevents the system from reusing pages that it actually
needs under certainly types of load. In particular, the situation the
system is saving itself from by making this call is the situation where
a user is reading data file(s) sequentially which are far larger then
can be reasonably cached. In that situation strict LRU operation
would result in terrible performance due to the system attempting to
unconditionally cache data it is going to have to throw away anyway,
and soon, which displaces older cached data that it will actually need
soon. LRU isn't always the best policy.
When you disable vm_page_dontneed() the huge amount of data you are
moving through the system create a huge amount of pressure on the rest
of the VM system, thus the slower performance when your data operations
exceed what can be reasonably cached. This would also have a severely
detrimental effect on production systems running real loads.
It's a tradeoff. The system is trading off some cpu overhead
generally in order to deal with a fairly common heavy-loading
case and in order to reduce the pressure on the VM system for
situations (such as reading a large file sequentially) which
have no business putting pressure on the VM system. e.g. the
system is trying to avoid blowing away user B's cache when user A
reads a huge file. Your patch is changing the tradeoff, but not
really making things better overall. Sure, the buildworld test went
faster, but that's just one type of load.
I am somewhat surprised at your 32MB tests. Are you sure you
stabilized the dd before getting those timings? It would take
more then one run of the dd on the file to completely cache it (that's
one of the effects of vm_page_dontneed(). Since the system can't
predict whether a large file is going to be re-read over and over
again, or just read once, or even how much data will be read, it
depresses the priority of pages statistically so it might take
several full reads of the file for the system to realize that you
really do want to cache the whole thing. In anycase, 32MB dd's
should be fully cached in the buffer cache, with no rewiring of
pages occuring at all, so I'm not sure why your patch is faster
for that case. It shouldn't be. Or the 64MB case. The 96MB
case is getting close to what your setup can cache reasonably.
The pre-patch code can deal with it, but with your patch you are
probably putting enough extra pressure on the VM system to force
the pageout daemon to run earlier then it would without the patch.
The VM system is a very finely tuned beast. That isn't to say that
it can't be improved, I'm sure it can, and I encourage you to play
with it! But you have to be wary of it as well. The VM system is
tuned primarily for performance under heavy loads. There is a slight
loss of performance under light loads because of the extra management.
You have to be sure not to screw up the heavy-load performance when
running light-load benchmarks. A buildworld is a light load benchmark,
primarily because it execs so programs so many times (the compiler)
that there are a lot of free VM pages sitting around for it to use.
Buildworlds do not load-test the VM system all that well! A dd test
is not supposed to load-test the VM system either. This is why we have
vm_page_dontneeds()'s.. user B's cache shouldn't be blown away just
because user A is reading a large file. We lose a little in a light
load test but gain a lot under real world loads which put constant
pressure on the VM system.
-Matt
Matthew Dillon
<dillon@backplane.com>
:I tried that on the same PC as my last benchmark. The PC has 160MB
:RAM, so I created a file of 256MB.
:
:One pre-read (in order to stabilize the buffer cache) and four read
:tests were run consecutively for each of six distinct read sizes just
:after boot. The average read times (in seconds) and speeds (in
:MB/sec) are shown below:
:
:
: without my patch with my patch
:read size time speed time speed
:32MB .497 65.5 .471 69.0
:64MB 1.02 63.6 .901 72.1
:96MB 2.24 50.5 5.52 18.9
:128MB 20.7 6.19 16.5 7.79
:192MB 32.9 5.83 32.9 5.83
:256MB 42.5 6.02 43.0 5.95
:
:
:dillon> Its case (1) that you are manipulating with your patch, and as you can
:dillon> see it is entirely dependant on the number of wired pages that the
:dillon> system is able to maintain in the buffer cache.
:
:The results of 128MB-read are likely to be so.
:
:96MB-read gave interesting results. Since vfs_unwirepages() passes
:buffer pages to vm_page_dontneed(), it seems that the page scanner
:reclaims buffer cache pages too aggressively.
:
:The table below shows the results with my patch where
:vfs_unwirepages() does not call vm_page_dontneed().
:
:
:read size time speed
:32MB .503 63.7
:64MB .916 70.5
:96MB 4.57 27.1
:128MB 17.0 7.62
:192MB 35.8 5.36
:256MB 46.0 5.56
:
:
:The 96MB-read results were a little bit better, although the reads of
:larger sizes became slower. The unwired buffer pages may be putting
:a pressure on user process pages and the page scanner.
:
:--
:Seigo Tanimura <tanimura@axe-inc.co.jp>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200210310717.g9V7HNff020614>
