From owner-freebsd-current Wed Oct 30 23:17:32 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F2ADE37B401; Wed, 30 Oct 2002 23:17:28 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7DC2D43E4A; Wed, 30 Oct 2002 23:17:28 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.5/8.12.5) with ESMTP id g9V7HNFC020615; Wed, 30 Oct 2002 23:17:23 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.5/8.12.5/Submit) id g9V7HNff020614; Wed, 30 Oct 2002 23:17:23 -0800 (PST) (envelope-from dillon) Date: Wed, 30 Oct 2002 23:17:23 -0800 (PST) From: Matthew Dillon Message-Id: <200210310717.g9V7HNff020614@apollo.backplane.com> To: Seigo Tanimura Cc: Jeff Roberson , Seigo Tanimura , Bruce Evans , current@FreeBSD.ORG, tanimura@FreeBSD.ORG Subject: Re: Dynamic growth of the buffer and buffer page reclaim References: <20021023163758.R22147-100000@mail.chesapeake.net> <200210280854.g9S8svSr094312@apollo.backplane.com> <200210310608.g9V68SoK022560@shojaku.t.axe-inc.co.jp> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Yes, this makes a lot of sense to me. You are exercising the system in a way that breaks the LRU algorithm. The buffer cache, without your patch, is carefully tuned to deal with this case... that is why vm_page_dontneed() exists and why the vm_object code calls it. This creates a little extra work when the buffer cache cycles, but prevents the system from reusing pages that it actually needs under certainly types of load. In particular, the situation the system is saving itself from by making this call is the situation where a user is reading data file(s) sequentially which are far larger then can be reasonably cached. In that situation strict LRU operation would result in terrible performance due to the system attempting to unconditionally cache data it is going to have to throw away anyway, and soon, which displaces older cached data that it will actually need soon. LRU isn't always the best policy. When you disable vm_page_dontneed() the huge amount of data you are moving through the system create a huge amount of pressure on the rest of the VM system, thus the slower performance when your data operations exceed what can be reasonably cached. This would also have a severely detrimental effect on production systems running real loads. It's a tradeoff. The system is trading off some cpu overhead generally in order to deal with a fairly common heavy-loading case and in order to reduce the pressure on the VM system for situations (such as reading a large file sequentially) which have no business putting pressure on the VM system. e.g. the system is trying to avoid blowing away user B's cache when user A reads a huge file. Your patch is changing the tradeoff, but not really making things better overall. Sure, the buildworld test went faster, but that's just one type of load. I am somewhat surprised at your 32MB tests. Are you sure you stabilized the dd before getting those timings? It would take more then one run of the dd on the file to completely cache it (that's one of the effects of vm_page_dontneed(). Since the system can't predict whether a large file is going to be re-read over and over again, or just read once, or even how much data will be read, it depresses the priority of pages statistically so it might take several full reads of the file for the system to realize that you really do want to cache the whole thing. In anycase, 32MB dd's should be fully cached in the buffer cache, with no rewiring of pages occuring at all, so I'm not sure why your patch is faster for that case. It shouldn't be. Or the 64MB case. The 96MB case is getting close to what your setup can cache reasonably. The pre-patch code can deal with it, but with your patch you are probably putting enough extra pressure on the VM system to force the pageout daemon to run earlier then it would without the patch. The VM system is a very finely tuned beast. That isn't to say that it can't be improved, I'm sure it can, and I encourage you to play with it! But you have to be wary of it as well. The VM system is tuned primarily for performance under heavy loads. There is a slight loss of performance under light loads because of the extra management. You have to be sure not to screw up the heavy-load performance when running light-load benchmarks. A buildworld is a light load benchmark, primarily because it execs so programs so many times (the compiler) that there are a lot of free VM pages sitting around for it to use. Buildworlds do not load-test the VM system all that well! A dd test is not supposed to load-test the VM system either. This is why we have vm_page_dontneeds()'s.. user B's cache shouldn't be blown away just because user A is reading a large file. We lose a little in a light load test but gain a lot under real world loads which put constant pressure on the VM system. -Matt Matthew Dillon :I tried that on the same PC as my last benchmark. The PC has 160MB :RAM, so I created a file of 256MB. : :One pre-read (in order to stabilize the buffer cache) and four read :tests were run consecutively for each of six distinct read sizes just :after boot. The average read times (in seconds) and speeds (in :MB/sec) are shown below: : : : without my patch with my patch :read size time speed time speed :32MB .497 65.5 .471 69.0 :64MB 1.02 63.6 .901 72.1 :96MB 2.24 50.5 5.52 18.9 :128MB 20.7 6.19 16.5 7.79 :192MB 32.9 5.83 32.9 5.83 :256MB 42.5 6.02 43.0 5.95 : : :dillon> Its case (1) that you are manipulating with your patch, and as you can :dillon> see it is entirely dependant on the number of wired pages that the :dillon> system is able to maintain in the buffer cache. : :The results of 128MB-read are likely to be so. : :96MB-read gave interesting results. Since vfs_unwirepages() passes :buffer pages to vm_page_dontneed(), it seems that the page scanner :reclaims buffer cache pages too aggressively. : :The table below shows the results with my patch where :vfs_unwirepages() does not call vm_page_dontneed(). : : :read size time speed :32MB .503 63.7 :64MB .916 70.5 :96MB 4.57 27.1 :128MB 17.0 7.62 :192MB 35.8 5.36 :256MB 46.0 5.56 : : :The 96MB-read results were a little bit better, although the reads of :larger sizes became slower. The unwired buffer pages may be putting :a pressure on user process pages and the page scanner. : :-- :Seigo Tanimura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message