From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 11 06:07:08 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F37EF1065670 for ; Wed, 11 Apr 2012 06:07:07 +0000 (UTC) (envelope-from andrey@zonov.org) Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 6518B8FC0C for ; Wed, 11 Apr 2012 06:07:07 +0000 (UTC) Received: by lagv3 with SMTP id v3so543685lag.13 for ; Tue, 10 Apr 2012 23:07:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:x-gm-message-state; bh=c4Qz4j2XyjOk6yqaF24CciWod5hdVfmp1vAuKhsqBvQ=; b=G1rEmOwNYAGNeg9X+zPc9Lk20ThLZYIyJ0+ypuxKyHOfghSRAoMORRAat0HvbiS33N OuHWJ3moDiSF+2QMa15lXTLN5zGJoa7k0DCXextwU5plti73blexxATjiSCJOInuViCm TnQ5kxrdqXITY28zzB8TLnzl6UmT5HWo8mG35cvwPoTTYORM0fFLNXLqq+1S91sL+Lp4 bzC4UoQ8sGwrBMSt8RaLPkHyg57r+anIgLof89fLvQ2omlu3/iVuZXgs0KdV1II29X3r REATC3pNl+DrjkSE8BRcMTV9vu7UBXMNlrbPiS/1RMUIHCK036xlomnmbfft7CPin/0/ 2K2Q== Received: by 10.112.30.1 with SMTP id o1mr2367102lbh.3.1334124425428; Tue, 10 Apr 2012 23:07:05 -0700 (PDT) Received: from [10.254.254.77] (ppp109-252-212-252.pppoe.spdop.ru. [109.252.212.252]) by mx.google.com with ESMTPS id py12sm1698134lab.4.2012.04.10.23.07.03 (version=SSLv3 cipher=OTHER); Tue, 10 Apr 2012 23:07:04 -0700 (PDT) Message-ID: <4F851F87.3050206@zonov.org> Date: Wed, 11 Apr 2012 10:07:03 +0400 From: Andrey Zonov User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.1.24) Gecko/20100228 Thunderbird/2.0.0.24 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: Alan Cox References: <4F7B495D.3010402@zonov.org> <20120404071746.GJ2358@deviant.kiev.zoral.com.ua> <4F7DC037.9060803@rice.edu> <201204091126.25260.jhb@freebsd.org> <4F845D9B.10004@rice.edu> In-Reply-To: <4F845D9B.10004@rice.edu> Content-Type: multipart/mixed; boundary="------------060005080001060707000904" X-Gm-Message-State: ALoCoQkm0RxYXUJ6pM8WL91lL4n0GuDSRcRCsuWAKJA8w7L06iRh++ZVg5r9568wWNUNKv/DmOgT Cc: Konstantin Belousov , freebsd-hackers@freebsd.org, alc@freebsd.org Subject: Re: problems with mmap() and disk caching X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Apr 2012 06:07:08 -0000 This is a multi-part message in MIME format. --------------060005080001060707000904 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 10.04.2012 20:19, Alan Cox wrote: > On 04/09/2012 10:26, John Baldwin wrote: >> On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote: >>> On 04/04/2012 02:17, Konstantin Belousov wrote: >>>> On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: >>>>> Hi, >>>>> >>>>> I open the file, then call mmap() on the whole file and get pointer, >>>>> then I work with this pointer. I expect that page should be only once >>>>> touched to get it into the memory (disk cache?), but this doesn't >>>>> work! >>>>> >>>>> I wrote the test (attached) and ran it for the 1G file generated from >>>>> /dev/random, the result is the following: >>>>> >>>>> Prepare file: >>>>> # swapoff -a >>>>> # newfs /dev/ada0b >>>>> # mount /dev/ada0b /mnt >>>>> # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 >>>>> >>>>> Purge cache: >>>>> # umount /mnt >>>>> # mount /dev/ada0b /mnt >>>>> >>>>> Run test: >>>>> $ ./mmap /mnt/random-1024 30 >>>>> mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: >>>>> 0; other: 0) >>>>> mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super: >>>>> 0; other: 0) >>>>> mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: >>>>> 0; other: 0) >>>>> mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: >>>>> 0; other: 0) >>>>> mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: >>>>> 0; other: 0) >>>>> mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: >>>>> 0; other: 0) >>>>> mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: >>>>> 0; other: 0) >>>>> mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: >>>>> 0; other: 0) >>>>> mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: >>>>> 0; other: 0) >>>>> mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: >>>>> 0; other: 0) >>>>> mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: >>>>> 0; other: 0) >>>>> mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: >>>>> 0; other: 0) >>>>> mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: >>>>> 0; other: 0) >>>>> mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: >>>>> 0; other: 0) >>>>> mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: >>>>> 0; other: 0) >>>>> mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: >>>>> 0; other: 0) >>>>> mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: >>>>> 0; other: 0) >>>>> mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: >>>>> 0; other: 0) >>>>> mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: >>>>> 0; other: 0) >>>>> mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: >>>>> 0; other: 0) >>>>> mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: >>>>> 0; other: 0) >>>>> mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: >>>>> 0; other: 0) >>>>> mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: >>>>> 0; other: 0) >>>>> mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: >>>>> 0; other: 0) >>>>> mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: >>>>> 0; other: 0) >>>>> mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: >>>>> 0; other: 0) >>>>> mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: >>>>> 0; other: 0) >>>>> mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: >>>>> 0; other: 0) >>>>> mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: >>>>> 0; other: 0) >>>>> mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: >>>>> 0; other: 0) >>>>> >>>>> If I ran this: >>>>> $ cat /mnt/random-1024> /dev/null >>>>> before test, when result is the following: >>>>> >>>>> $ ./mmap /mnt/random-1024 5 >>>>> mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: >>>>> 0; other: 0) >>>>> mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: >>>>> 0; other: 0) >>>>> mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: >>>>> 0; other: 0) >>>>> mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: >>>>> 0; other: 0) >>>>> mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: >>>>> 0; other: 0) >>>>> >>>>> This is what I expect. But why this doesn't work without reading file >>>>> manually? >>>> Issue seems to be in some change of the behaviour of the reserv or >>>> phys allocator. I Cc:ed Alan. >>> I'm pretty sure that the behavior here hasn't significantly changed in >>> about twelve years. Otherwise, I agree with your analysis. >>> >>> On more than one occasion, I've been tempted to change: >>> >>> pmap_remove_all(mt); >>> if (mt->dirty != 0) >>> vm_page_deactivate(mt); >>> else >>> vm_page_cache(mt); >>> >>> to: >>> >>> vm_page_dontneed(mt); >>> >>> because I suspect that the current code does more harm than good. In >>> theory, it saves activations of the page daemon. However, more often >>> than not, I suspect that we are spending more on page reactivations than >>> we are saving on page daemon activations. The sequential access >>> detection heuristic is just too easily triggered. For example, I've >>> seen it triggered by demand paging of the gcc text segment. Also, I >>> think that pmap_remove_all() and especially vm_page_cache() are too >>> severe for a detection heuristic that is so easily triggered. >> Are you planning to commit this? >> > > Not yet. I did some tests with a file that was several times larger than > DRAM, and I didn't like what I saw. Initially, everything behaved as > expected, but about halfway through the test the bulk of the pages were > active. Despite the call to pmap_clear_reference() in > vm_page_dontneed(), the page daemon is finding the pages to be > referenced and reactivating them. The net result is that the time it > takes to read the file (from a relatively fast SSD) goes up by about > 12%. So, this still needs work. > Hi Alan, What do you think about attached patch? -- Andrey Zonov --------------060005080001060707000904 Content-Type: text/plain; charset=windows-1251; name="vm_fault.c.patch.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="vm_fault.c.patch.txt" Index: sys/vm/vm_fault.c =================================================================== --- sys/vm/vm_fault.c (revision 233744) +++ sys/vm/vm_fault.c (working copy) @@ -114,9 +114,9 @@ static int vm_fault_additional_pages(vm_page_t, int, int, vm_page_t *, int *); static void vm_fault_prefault(pmap_t, vm_offset_t, vm_map_entry_t); -#define VM_FAULT_READ_AHEAD 8 -#define VM_FAULT_READ_BEHIND 7 -#define VM_FAULT_READ (VM_FAULT_READ_AHEAD+VM_FAULT_READ_BEHIND+1) +#define VM_FAULT_READ_AHEAD (MAXPHYS/PAGE_SIZE/2) +#define VM_FAULT_READ_BEHIND (VM_FAULT_READ_AHEAD-1) +#define VM_FAULT_READ (VM_FAULT_READ_AHEAD+VM_FAULT_READ_BEHIND+1) struct faultstate { vm_page_t m; --------------060005080001060707000904--