Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 05 Apr 2012 13:25:49 -0500
From:      Alan Cox <alc@rice.edu>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        alc@freebsd.org, freebsd-hackers@freebsd.org, Andrey Zonov <andrey@zonov.org>
Subject:   Re: problems with mmap() and disk caching
Message-ID:  <4F7DE3AD.5080401@rice.edu>
In-Reply-To: <20120405173138.GX2358@deviant.kiev.zoral.com.ua>
References:  <4F7B495D.3010402@zonov.org> <20120404071746.GJ2358@deviant.kiev.zoral.com.ua> <4F7DC037.9060803@rice.edu> <20120405173138.GX2358@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 04/05/2012 12:31, Konstantin Belousov wrote:
> On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote:
>> On 04/04/2012 02:17, Konstantin Belousov wrote:
>>> On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:
>>>> Hi,
>>>>
>>>> I open the file, then call mmap() on the whole file and get pointer,
>>>> then I work with this pointer.  I expect that page should be only once
>>>> touched to get it into the memory (disk cache?), but this doesn't work!
>>>>
>>>> I wrote the test (attached) and ran it for the 1G file generated from
>>>> /dev/random, the result is the following:
>>>>
>>>> Prepare file:
>>>> # swapoff -a
>>>> # newfs /dev/ada0b
>>>> # mount /dev/ada0b /mnt
>>>> # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024
>>>>
>>>> Purge cache:
>>>> # umount /mnt
>>>> # mount /dev/ada0b /mnt
>>>>
>>>> Run test:
>>>> $ ./mmap /mnt/random-1024 30
>>>> mmap:  1 pass took:   7.431046 (none: 262112; res:     32; super:
>>>> 0; other:      0)
>>>> mmap:  2 pass took:   7.356670 (none: 261648; res:    496; super:
>>>> 0; other:      0)
>>>> mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super:
>>>> 0; other:      0)
>>>> mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super:
>>>> 0; other:      0)
>>>> mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super:
>>>> 0; other:      0)
>>>> mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super:
>>>> 0; other:      0)
>>>> mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super:
>>>> 0; other:      0)
>>>> mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super:
>>>> 0; other:      0)
>>>> mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super:
>>>> 0; other:      0)
>>>> mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super:
>>>> 0; other:      0)
>>>> mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super:
>>>> 0; other:      0)
>>>> mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super:
>>>> 0; other:      0)
>>>> mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super:
>>>> 0; other:      0)
>>>> mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super:
>>>> 0; other:      0)
>>>> mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super:
>>>> 0; other:      0)
>>>> mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super:
>>>> 0; other:      0)
>>>> mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super:
>>>> 0; other:      0)
>>>> mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super:
>>>> 0; other:      0)
>>>> mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super:
>>>> 0; other:      0)
>>>> mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super:
>>>> 0; other:      0)
>>>> mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super:
>>>> 0; other:      0)
>>>> mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super:
>>>> 0; other:      0)
>>>> mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super:
>>>> 0; other:      0)
>>>> mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super:
>>>> 0; other:      0)
>>>> mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super:
>>>> 0; other:      0)
>>>> mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super:
>>>> 0; other:      0)
>>>> mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super:
>>>> 0; other:      0)
>>>> mmap: 28 pass took:   0.157508 (none:      0; res: 262144; super:
>>>> 0; other:      0)
>>>> mmap: 29 pass took:   0.156169 (none:      0; res: 262144; super:
>>>> 0; other:      0)
>>>> mmap: 30 pass took:   0.156550 (none:      0; res: 262144; super:
>>>> 0; other:      0)
>>>>
>>>> If I ran this:
>>>> $ cat /mnt/random-1024>   /dev/null
>>>> before test, when result is the following:
>>>>
>>>> $ ./mmap /mnt/random-1024 5
>>>> mmap:  1 pass took:   0.337657 (none:      0; res: 262144; super:
>>>> 0; other:      0)
>>>> mmap:  2 pass took:   0.186137 (none:      0; res: 262144; super:
>>>> 0; other:      0)
>>>> mmap:  3 pass took:   0.186132 (none:      0; res: 262144; super:
>>>> 0; other:      0)
>>>> mmap:  4 pass took:   0.186535 (none:      0; res: 262144; super:
>>>> 0; other:      0)
>>>> mmap:  5 pass took:   0.190353 (none:      0; res: 262144; super:
>>>> 0; other:      0)
>>>>
>>>> This is what I expect.  But why this doesn't work without reading file
>>>> manually?
>>> Issue seems to be in some change of the behaviour of the reserv or
>>> phys allocator. I Cc:ed Alan.
>> I'm pretty sure that the behavior here hasn't significantly changed in
>> about twelve years.  Otherwise, I agree with your analysis.
>>
>> On more than one occasion, I've been tempted to change:
>>
>>                                          pmap_remove_all(mt);
>>                                          if (mt->dirty != 0)
>>                                                  vm_page_deactivate(mt);
>>                                          else
>>                                                  vm_page_cache(mt);
>>
>> to:
>>
>>                                          vm_page_dontneed(mt);
>>
>> because I suspect that the current code does more harm than good.  In
>> theory, it saves activations of the page daemon.  However, more often
>> than not, I suspect that we are spending more on page reactivations than
>> we are saving on page daemon activations.  The sequential access
>> detection heuristic is just too easily triggered.  For example, I've
>> seen it triggered by demand paging of the gcc text segment.  Also, I
>> think that pmap_remove_all() and especially vm_page_cache() are too
>> severe for a detection heuristic that is so easily triggered.
> Yes, I agree that such change shall be an improvement, and I expect
> that Andrey will test it.
>
> On the other hand, I do think that allocator should prefer unnamed
> pages to pages which still have valid content. On my 12G desktop,
> I never saw more then 100MB of cached pages, and similar numbers
> are observed on the 32-48GB servers. I suppose that this is related.

On allocation, the system does prefer free pages over cached pages.  
When cached pages are added to the physical memory allocator, they are 
added to VM_FREEPOOL_CACHE.  When pages are allocated, they are taken 
from VM_FREEPOOL_DEFAULT.  Generally, pages only move from the CACHE 
pool to the DEFAULT pool when the DEFAULT pool is depleted.  (However, 
occasionally, they do move because of coalescing.)  When I redid the 
physical memory allocator, I looked at the rate of cached page 
reactivation under the old and the new allocators.  At least for the 
tests that I did the rates weren't that different.  It was low, 
single-digit percentages.  I think the highest likelihood of 
reactivation comes from the pages that are cached by the sequential 
access heuristic because it is so overzealous.

I don't think it's related.  You see modest numbers of cached pages 
simply because the page daemon met its target for the sum of free and 
cached pages.  So, it just stopped moving pages from the inactive queue 
into the physical memory allocator's cache/free queues.

Alan




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F7DE3AD.5080401>