From owner-freebsd-hackers@FreeBSD.ORG Fri Apr 4 20:24:25 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CE590F4 for ; Fri, 4 Apr 2014 20:24:25 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A4475D50 for ; Fri, 4 Apr 2014 20:24:25 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 33CB5B924; Fri, 4 Apr 2014 16:24:24 -0400 (EDT) From: John Baldwin To: Dmitry Sivachenko Subject: Re: madvise() vs posix_fadvise() Date: Fri, 4 Apr 2014 16:12:35 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: <201404031102.38598.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201404041612.35889.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 04 Apr 2014 16:24:24 -0400 (EDT) Cc: freebsd-hackers@freebsd.org, Trond =?utf-8?q?Endrest=C3=B8l?= X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Apr 2014 20:24:25 -0000 On Friday, April 04, 2014 1:52:09 pm Dmitry Sivachenko wrote: >=20 > On 03 =D0=B0=D0=BF=D1=80. 2014 =D0=B3., at 19:02, John Baldwin wrote: >=20 > >>=20 > >> Right now I am facing the following problem (stable/10): > >> There is a (home-grown) webserver which mmap's a large amount of data = files (total size is a bit below of RAM, say ~90GB of files with 128GB of R= AM). > >> Server writes access.log (several gigabytes per day). > >>=20 > >> Some of mmaped data files are used frequently, some are used rarely. O= n startup, server walks through all of these data files so it's content is = read=20 > > from disk. > >>=20 > >> After some time of running, I see that rarely used data files are purg= ed from RAM (access to them leads to long-running disk reads) in favour of = disk=20 > > cache > >> (at 0:00, when I rotate and gzip log file I see Inactive memory goes d= own to the value of log file size). > >>=20 > >> Is there any way to tell VM system not to push mmap'ed regions out of = RAM in favour of disk caches? > >=20 > > Use POSIX_FADV_NOREUSE with fadvise() for the log files. They are a pe= rfect > > use case for this flag. This will tell the VM system to throw the log = data > > (move it to cache) after it writes the file. >=20 >=20 >=20 > Another question is why madvise(MADV_WILLNEED) is not enough to prefer > keeping mmap'ed data in memory instead of dedicating all memory to cache = log files? > Even if that mmap'ed memory is rarely used. MADV_WILLNEED is an instant action, not a policy statement. It means "please pre-fetch this data right now as I'm going to use it in the next few seconds". It does not mean "this data is more frequently used, so try to keep it around for the next few hours". > While POSIX_FADV_NOREUSE might be a solution for some cases (I am already > testing it), it needs to be implemented in many programs (all that read/w= rite > files on disk), while madvise(MADV_WILLNEED) sounds like a proper solution > to increase priority for mmaped region regardless of what other programs = use > disk but it does not seem to work as expected. MADV_WILLNEED is not going to give you what you want. OTOH, if you haven't tried FreeBSD 10 yet, I would suggest trying that. There have been changes to pagedaemon that might make it do a better job of kicking out the pages of the log files automatically. =2D-=20 John Baldwin