Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Apr 2012 10:07:03 +0400
From:      Andrey Zonov <andrey@zonov.org>
To:        Alan Cox <alc@rice.edu>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-hackers@freebsd.org, alc@freebsd.org
Subject:   Re: problems with mmap() and disk caching
Message-ID:  <4F851F87.3050206@zonov.org>
In-Reply-To: <4F845D9B.10004@rice.edu>
References:  <4F7B495D.3010402@zonov.org> <20120404071746.GJ2358@deviant.kiev.zoral.com.ua> <4F7DC037.9060803@rice.edu> <201204091126.25260.jhb@freebsd.org> <4F845D9B.10004@rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------060005080001060707000904
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

On 10.04.2012 20:19, Alan Cox wrote:
> On 04/09/2012 10:26, John Baldwin wrote:
>> On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote:
>>> On 04/04/2012 02:17, Konstantin Belousov wrote:
>>>> On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:
>>>>> Hi,
>>>>>
>>>>> I open the file, then call mmap() on the whole file and get pointer,
>>>>> then I work with this pointer. I expect that page should be only once
>>>>> touched to get it into the memory (disk cache?), but this doesn't
>>>>> work!
>>>>>
>>>>> I wrote the test (attached) and ran it for the 1G file generated from
>>>>> /dev/random, the result is the following:
>>>>>
>>>>> Prepare file:
>>>>> # swapoff -a
>>>>> # newfs /dev/ada0b
>>>>> # mount /dev/ada0b /mnt
>>>>> # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024
>>>>>
>>>>> Purge cache:
>>>>> # umount /mnt
>>>>> # mount /dev/ada0b /mnt
>>>>>
>>>>> Run test:
>>>>> $ ./mmap /mnt/random-1024 30
>>>>> mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super:
>>>>> 0; other: 0)
>>>>> mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super:
>>>>> 0; other: 0)
>>>>> mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super:
>>>>> 0; other: 0)
>>>>> mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super:
>>>>> 0; other: 0)
>>>>> mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super:
>>>>> 0; other: 0)
>>>>> mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super:
>>>>> 0; other: 0)
>>>>> mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super:
>>>>> 0; other: 0)
>>>>> mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super:
>>>>> 0; other: 0)
>>>>> mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super:
>>>>> 0; other: 0)
>>>>> mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super:
>>>>> 0; other: 0)
>>>>> mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super:
>>>>> 0; other: 0)
>>>>> mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super:
>>>>> 0; other: 0)
>>>>> mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super:
>>>>> 0; other: 0)
>>>>> mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super:
>>>>> 0; other: 0)
>>>>> mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super:
>>>>> 0; other: 0)
>>>>> mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super:
>>>>> 0; other: 0)
>>>>> mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super:
>>>>> 0; other: 0)
>>>>> mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super:
>>>>> 0; other: 0)
>>>>> mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super:
>>>>> 0; other: 0)
>>>>> mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super:
>>>>> 0; other: 0)
>>>>> mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super:
>>>>> 0; other: 0)
>>>>> mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super:
>>>>> 0; other: 0)
>>>>> mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super:
>>>>> 0; other: 0)
>>>>> mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super:
>>>>> 0; other: 0)
>>>>> mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super:
>>>>> 0; other: 0)
>>>>> mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super:
>>>>> 0; other: 0)
>>>>> mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super:
>>>>> 0; other: 0)
>>>>> mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super:
>>>>> 0; other: 0)
>>>>> mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super:
>>>>> 0; other: 0)
>>>>> mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super:
>>>>> 0; other: 0)
>>>>>
>>>>> If I ran this:
>>>>> $ cat /mnt/random-1024> /dev/null
>>>>> before test, when result is the following:
>>>>>
>>>>> $ ./mmap /mnt/random-1024 5
>>>>> mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super:
>>>>> 0; other: 0)
>>>>> mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super:
>>>>> 0; other: 0)
>>>>> mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super:
>>>>> 0; other: 0)
>>>>> mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super:
>>>>> 0; other: 0)
>>>>> mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super:
>>>>> 0; other: 0)
>>>>>
>>>>> This is what I expect. But why this doesn't work without reading file
>>>>> manually?
>>>> Issue seems to be in some change of the behaviour of the reserv or
>>>> phys allocator. I Cc:ed Alan.
>>> I'm pretty sure that the behavior here hasn't significantly changed in
>>> about twelve years. Otherwise, I agree with your analysis.
>>>
>>> On more than one occasion, I've been tempted to change:
>>>
>>> pmap_remove_all(mt);
>>> if (mt->dirty != 0)
>>> vm_page_deactivate(mt);
>>> else
>>> vm_page_cache(mt);
>>>
>>> to:
>>>
>>> vm_page_dontneed(mt);
>>>
>>> because I suspect that the current code does more harm than good. In
>>> theory, it saves activations of the page daemon. However, more often
>>> than not, I suspect that we are spending more on page reactivations than
>>> we are saving on page daemon activations. The sequential access
>>> detection heuristic is just too easily triggered. For example, I've
>>> seen it triggered by demand paging of the gcc text segment. Also, I
>>> think that pmap_remove_all() and especially vm_page_cache() are too
>>> severe for a detection heuristic that is so easily triggered.
>> Are you planning to commit this?
>>
>
> Not yet. I did some tests with a file that was several times larger than
> DRAM, and I didn't like what I saw. Initially, everything behaved as
> expected, but about halfway through the test the bulk of the pages were
> active. Despite the call to pmap_clear_reference() in
> vm_page_dontneed(), the page daemon is finding the pages to be
> referenced and reactivating them. The net result is that the time it
> takes to read the file (from a relatively fast SSD) goes up by about
> 12%. So, this still needs work.
>

Hi Alan,

What do you think about attached patch?


-- 
Andrey Zonov

--------------060005080001060707000904
Content-Type: text/plain; charset=windows-1251;
 name="vm_fault.c.patch.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="vm_fault.c.patch.txt"

Index: sys/vm/vm_fault.c
===================================================================
--- sys/vm/vm_fault.c	(revision 233744)
+++ sys/vm/vm_fault.c	(working copy)
@@ -114,9 +114,9 @@
 static int vm_fault_additional_pages(vm_page_t, int, int, vm_page_t *, int *);
 static void vm_fault_prefault(pmap_t, vm_offset_t, vm_map_entry_t);
 
-#define VM_FAULT_READ_AHEAD 8
-#define VM_FAULT_READ_BEHIND 7
-#define VM_FAULT_READ (VM_FAULT_READ_AHEAD+VM_FAULT_READ_BEHIND+1)
+#define VM_FAULT_READ_AHEAD	(MAXPHYS/PAGE_SIZE/2)
+#define VM_FAULT_READ_BEHIND	(VM_FAULT_READ_AHEAD-1)
+#define VM_FAULT_READ		(VM_FAULT_READ_AHEAD+VM_FAULT_READ_BEHIND+1)
 
 struct faultstate {
 	vm_page_t m;

--------------060005080001060707000904--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F851F87.3050206>