From owner-freebsd-hackers@FreeBSD.ORG Fri Apr 6 08:39:03 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B81DE1065672; Fri, 6 Apr 2012 08:39:03 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 13BCD8FC08; Fri, 6 Apr 2012 08:39:02 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q368cwfp012655; Fri, 6 Apr 2012 11:38:58 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q368cwEn058974; Fri, 6 Apr 2012 11:38:58 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q368cwW7058973; Fri, 6 Apr 2012 11:38:58 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 6 Apr 2012 11:38:58 +0300 From: Konstantin Belousov To: Alan Cox Message-ID: <20120406083858.GG2358@deviant.kiev.zoral.com.ua> References: <4F7B495D.3010402@zonov.org> <20120404071746.GJ2358@deviant.kiev.zoral.com.ua> <4F7DC037.9060803@rice.edu> <20120405173138.GX2358@deviant.kiev.zoral.com.ua> <4F7DE3AD.5080401@rice.edu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="iX3VwOUIQMdbvojH" Content-Disposition: inline In-Reply-To: <4F7DE3AD.5080401@rice.edu> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: alc@freebsd.org, freebsd-hackers@freebsd.org, Andrey Zonov Subject: Re: problems with mmap() and disk caching X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Apr 2012 08:39:03 -0000 --iX3VwOUIQMdbvojH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 05, 2012 at 01:25:49PM -0500, Alan Cox wrote: > On 04/05/2012 12:31, Konstantin Belousov wrote: > >On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote: > >>On 04/04/2012 02:17, Konstantin Belousov wrote: > >>>On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: > >>>>Hi, > >>>> > >>>>I open the file, then call mmap() on the whole file and get pointer, > >>>>then I work with this pointer. I expect that page should be only once > >>>>touched to get it into the memory (disk cache?), but this doesn't wor= k! > >>>> > >>>>I wrote the test (attached) and ran it for the 1G file generated from > >>>>/dev/random, the result is the following: > >>>> > >>>>Prepare file: > >>>># swapoff -a > >>>># newfs /dev/ada0b > >>>># mount /dev/ada0b /mnt > >>>># dd if=3D/dev/random of=3D/mnt/random-1024 bs=3D1m count=3D1024 > >>>> > >>>>Purge cache: > >>>># umount /mnt > >>>># mount /dev/ada0b /mnt > >>>> > >>>>Run test: > >>>>$ ./mmap /mnt/random-1024 30 > >>>>mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: > >>>>0; other: 0) > >>>>mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super: > >>>>0; other: 0) > >>>>mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: > >>>>0; other: 0) > >>>>mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: > >>>>0; other: 0) > >>>>mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: > >>>>0; other: 0) > >>>>mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: > >>>>0; other: 0) > >>>>mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: > >>>>0; other: 0) > >>>>mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: > >>>>0; other: 0) > >>>>mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: > >>>>0; other: 0) > >>>>mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: > >>>>0; other: 0) > >>>>mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: > >>>>0; other: 0) > >>>>mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: > >>>>0; other: 0) > >>>>mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: > >>>>0; other: 0) > >>>>mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: > >>>>0; other: 0) > >>>>mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: > >>>>0; other: 0) > >>>>mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: > >>>>0; other: 0) > >>>>mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: > >>>>0; other: 0) > >>>>mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: > >>>>0; other: 0) > >>>>mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: > >>>>0; other: 0) > >>>>mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: > >>>>0; other: 0) > >>>>mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: > >>>>0; other: 0) > >>>>mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: > >>>>0; other: 0) > >>>>mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: > >>>>0; other: 0) > >>>>mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: > >>>>0; other: 0) > >>>>mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: > >>>>0; other: 0) > >>>>mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: > >>>>0; other: 0) > >>>>mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: > >>>>0; other: 0) > >>>>mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: > >>>>0; other: 0) > >>>>mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: > >>>>0; other: 0) > >>>>mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: > >>>>0; other: 0) > >>>> > >>>>If I ran this: > >>>>$ cat /mnt/random-1024> /dev/null > >>>>before test, when result is the following: > >>>> > >>>>$ ./mmap /mnt/random-1024 5 > >>>>mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: > >>>>0; other: 0) > >>>>mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: > >>>>0; other: 0) > >>>>mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: > >>>>0; other: 0) > >>>>mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: > >>>>0; other: 0) > >>>>mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: > >>>>0; other: 0) > >>>> > >>>>This is what I expect. But why this doesn't work without reading file > >>>>manually? > >>>Issue seems to be in some change of the behaviour of the reserv or > >>>phys allocator. I Cc:ed Alan. > >>I'm pretty sure that the behavior here hasn't significantly changed in > >>about twelve years. Otherwise, I agree with your analysis. > >> > >>On more than one occasion, I've been tempted to change: > >> > >> pmap_remove_all(mt); > >> if (mt->dirty !=3D 0) > >> vm_page_deactivate(mt); > >> else > >> vm_page_cache(mt); > >> > >>to: > >> > >> vm_page_dontneed(mt); > >> > >>because I suspect that the current code does more harm than good. In > >>theory, it saves activations of the page daemon. However, more often > >>than not, I suspect that we are spending more on page reactivations than > >>we are saving on page daemon activations. The sequential access > >>detection heuristic is just too easily triggered. For example, I've > >>seen it triggered by demand paging of the gcc text segment. Also, I > >>think that pmap_remove_all() and especially vm_page_cache() are too > >>severe for a detection heuristic that is so easily triggered. > >Yes, I agree that such change shall be an improvement, and I expect > >that Andrey will test it. > > > >On the other hand, I do think that allocator should prefer unnamed > >pages to pages which still have valid content. On my 12G desktop, > >I never saw more then 100MB of cached pages, and similar numbers > >are observed on the 32-48GB servers. I suppose that this is related. >=20 > On allocation, the system does prefer free pages over cached pages. =20 > When cached pages are added to the physical memory allocator, they are=20 > added to VM_FREEPOOL_CACHE. When pages are allocated, they are taken=20 > from VM_FREEPOOL_DEFAULT. Generally, pages only move from the CACHE=20 > pool to the DEFAULT pool when the DEFAULT pool is depleted. (However,=20 > occasionally, they do move because of coalescing.) When I redid the=20 > physical memory allocator, I looked at the rate of cached page=20 > reactivation under the old and the new allocators. At least for the=20 > tests that I did the rates weren't that different. It was low,=20 > single-digit percentages. I think the highest likelihood of=20 > reactivation comes from the pages that are cached by the sequential=20 > access heuristic because it is so overzealous. >=20 > I don't think it's related. You see modest numbers of cached pages=20 > simply because the page daemon met its target for the sum of free and=20 > cached pages. So, it just stopped moving pages from the inactive queue= =20 > into the physical memory allocator's cache/free queues. No, I mean something else. Specifically, I mean that somehow the preference for non-named pages does not work. At least, I cannot give any other explanation for the following experiment. Lets take stock HEAD without change in vm_fault.c. The initial state of 8GB machine is as follows, the test file was not even stat(2)-ed yet. Mem: 37M Active, 18M Inact, 150M Wired, 236K Cache, 27M Buf, 7612M Free Now, run the unmodified original Andrey' test with only one pass, making sequential read of the mmap of a 5GB file from UFS volume. After the run Mem: 38M Active, 18M Inact, 153M Wired, 21M Cache, 30M Buf, 7586M Free Please note that cached count increased only for 20M, and this is for calls to vm_page_cache() worth of 5GB. In other words, it seems that allocator almost never touches free memory, always preferring cache. This is mostly coincides with what I saw when I profiled original problem reported by Andrey. --iX3VwOUIQMdbvojH Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9+q6EACgkQC3+MBN1Mb4g3KACgrd/EEDznjuG/ZDQujCt3HLUf l7kAn2vcKkYTgsRfhcElYfsmBdSGaJvt =WXvc -----END PGP SIGNATURE----- --iX3VwOUIQMdbvojH--