From owner-freebsd-hackers@FreeBSD.ORG Thu Apr 3 17:17:50 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 95B4FAFF; Thu, 3 Apr 2014 17:17:50 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6C314817; Thu, 3 Apr 2014 17:17:50 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 5A017B980; Thu, 3 Apr 2014 13:17:49 -0400 (EDT) From: John Baldwin To: Ian Lepore Subject: Re: madvise() vs posix_fadvise() Date: Thu, 3 Apr 2014 12:30:40 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: <201404031102.38598.jhb@freebsd.org> <1396539837.81853.278.camel@revolution.hippie.lan> In-Reply-To: <1396539837.81853.278.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="koi8-r" Content-Transfer-Encoding: quoted-printable Message-Id: <201404031230.40380.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 03 Apr 2014 13:17:49 -0400 (EDT) Cc: freebsd-hackers@freebsd.org, Dmitry Sivachenko , Trond =?iso-8859-1?q?Endrest=F8l?= X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Apr 2014 17:17:50 -0000 On Thursday, April 03, 2014 11:43:57 am Ian Lepore wrote: > On Thu, 2014-04-03 at 11:02 -0400, John Baldwin wrote: > > On Thursday, April 03, 2014 7:29:03 am Dmitry Sivachenko wrote: > > >=20 > > > On 27 =CD=C1=D2=D4=C1 2014 =C7., at 19:41, John Baldwin wrote: > > > >>=20 > > > >> I know about mlock(2), it is a bit overkill. > > > >> Can someone please explain the difference between madvise(MADV_WIL= LNEED) and=20 > > > > posix_fadvise(POSIX_FADV_WILLNEED)? > > > >=20 > > > > Right now FADV_WILLNEED is a nop. (I have some patches to implemen= t it for > > > > UFS.) I can't recall off the top of my head if MADV_WILLNEED is al= so a nop. > > > > However, if both are fully implemented they should be similar in te= rms of > > > > requesting async read-ahead. MADV_WILLNEED might also conceivably > > > > pre-create PTEs while FADV_WILLNEED can be used on a file that isn't > > > > mapped but is accessed via read(2). > > > >=20 > > >=20 > > >=20 > > > Hello and thanks for your reply. > > >=20 > > > Right now I am facing the following problem (stable/10): > > > There is a (home-grown) webserver which mmap's a large amount of data= files (total size is a bit below of RAM, say ~90GB of files with 128GB of= =20 RAM). > > > Server writes access.log (several gigabytes per day). > > >=20 > > > Some of mmaped data files are used frequently, some are used rarely. = On startup, server walks through all of these data files so it's content=20 is read=20 > > from disk. > > >=20 > > > After some time of running, I see that rarely used data files are pur= ged from RAM (access to them leads to long-running disk reads) in favour=20 of disk=20 > > cache > > > (at 0:00, when I rotate and gzip log file I see Inactive memory goes = down to the value of log file size). > > >=20 > > > Is there any way to tell VM system not to push mmap'ed regions out of= RAM in favour of disk caches? > >=20 > > Use POSIX_FADV_NOREUSE with fadvise() for the log files. They are a pe= rfect > > use case for this flag. This will tell the VM system to throw the log = data > > (move it to cache) after it writes the file. > >=20 > > --=20 > > John Baldwin >=20 > Does that work well in the case of something like /var/log/messages that > is repeatedly appended-to at random intervals? It would be bad if every > new line written to the log triggered a physical read-modify-write. On > the other hand if it somehow results in the last / partitial block being > the only one likely to stay in memory, that would be perfect. The latter. It's sort of like a lazy O_DIRECT. Each time you call write(2= ), it tries to move any clean pages from your current sequentially written stream from inactive to cache, so the pages won't move until a subsequent write(2) after bufdaemon or the syncer actually forces them to be written. Unfortunately, it is currently implemented by doing an internal =46ADV_DONTNEED after each read() or write(). It would be better if it was implemented as a callback when buffers are completed. =2D-=20 John Baldwin