From owner-freebsd-hackers@FreeBSD.ORG  Sun Apr  6 08:38:04 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6EE0F1F1;
 Sun,  6 Apr 2014 08:38:04 +0000 (UTC)
Received: from mail-la0-x22d.google.com (mail-la0-x22d.google.com
 [IPv6:2a00:1450:4010:c03::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id BC2566EF;
 Sun,  6 Apr 2014 08:38:03 +0000 (UTC)
Received: by mail-la0-f45.google.com with SMTP id hr17so3779259lab.18
 for <multiple recipients>; Sun, 06 Apr 2014 01:38:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=content-type:mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=YCJHs6lB5KnWNzozBH5kCHf8NseZMTtr9Lu0LPWsHM4=;
 b=0TqFoorZh/P4/GrtgfkdK3AeDZPHVEp+FHjUdvO/yqr7KdgqSli+bxwFm3CCoQ0j7t
 L2SQFZz2WrG0S9tVyaRm4uckAb9H5k0ch1A53OMqRRH/fK58BmqltSrcjMitYQZVRcYT
 dcnwwVbt1+ZTlvhCcNuIOUQd0CaBqOYUINkrhx7WL/8WReqrVTzR/oQiu5NHdFk6UEKB
 /oBNt72Rbt3c3S2HGe9galAiSt5tHtaF2Zo/+qUAgcMMGfJCW+DFfV6uZ2/HZnrVoFhh
 /k0e9J0BQr3ek3FN9lDGT07TIgiLS1eluzjbH4QDfl/RU3hNNilFMhWLvSSM1WKtOygr
 mXTA==
X-Received: by 10.112.51.202 with SMTP id m10mr26198lbo.63.1396773481617;
 Sun, 06 Apr 2014 01:38:01 -0700 (PDT)
Received: from [10.0.1.9] ([176.193.27.55])
 by mx.google.com with ESMTPSA id wm1sm13121602lac.14.2014.04.06.01.37.59
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Sun, 06 Apr 2014 01:37:59 -0700 (PDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\))
Subject: Re: madvise() vs posix_fadvise()
From: Dmitry Sivachenko <trtrmitya@gmail.com>
In-Reply-To: <8DAE3175-FE32-4D17-A386-063DDB6C45F7@gmail.com>
Date: Sun, 6 Apr 2014 12:37:57 +0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <00B9699B-80D2-40E6-AA51-7B15191A4BDE@gmail.com>
References: <D6BD48AF-9522-495D-8D54-37854E53C272@gmail.com>
 <201404031102.38598.jhb@freebsd.org>
 <EF134BCA-1E92-4C98-8763-9A31EA96839A@gmail.com>
 <201404041612.35889.jhb@freebsd.org>
 <5426E303-E35B-4D4A-AB62-3571228A5A2C@gmail.com>
 <8DAE3175-FE32-4D17-A386-063DDB6C45F7@gmail.com>
To: John Baldwin <jhb@FreeBSD.org>
X-Mailer: Apple Mail (2.1874)
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 06 Apr 2014 08:38:04 -0000


On 06 =D0=B0=D0=BF=D1=80. 2014 =D0=B3., at 0:11, Dmitry Sivachenko =
<trtrmitya@gmail.com> wrote:

>=20
> On 05 =D0=B0=D0=BF=D1=80. 2014 =D0=B3., at 1:02, Dmitry Sivachenko =
<trtrmitya@gmail.com> wrote:
>=20
>> On 05 =D0=B0=D0=BF=D1=80. 2014 =D0=B3., at 0:12, John Baldwin =
<jhb@FreeBSD.org> wrote:
>>=20
>>>=20
>>> MADV_WILLNEED is not going to give you what you want.  OTOH, if you =
haven't
>>> tried FreeBSD 10 yet, I would suggest trying that.  There have been =
changes
>>> to pagedaemon that might make it do a better job of kicking out the =
pages
>>> of the log files automatically.
>>>=20
>>=20
>>=20
>> I did. My situation became worse after I moved from stable/9 to =
stable/10.
>> My feeling is that stable/10 pushes rarely used mmaped pages out of =
RAM more aggressively than stable/9 did.
>>=20
>> For now, the only solution I found is doing msync(MS_INVALIDATE) on =
log files after gzipping and after backup via rsync.
>> This moves corresponding memory pages from Inactive to Free and =
prevents system to occupy all free memory with cached log files and to =
purge mmaped data out of RAM to accomodate more disk cache.
>>=20
>> What I would love to see is an ability to tell OS not to release =
mmaped data unless "really needed" (disk cache is not an excuse).
>=20
>=20
> One more observation as it seems to be related.
> If my program allocates RAM via malloc() rather than mmap(), I see =
that VM swaps rarely used parts of malloced data out as disk is being =
used
> (more and more memory goes to Inactive with cached files content).
>=20
> This is also different from stable/9 and seems not good.  Why to keep =
cached content of files forever? (seems there is no timeout for keeping =
cached files content in Inactive state).  So after few days of uptime =
all available RAM is either in Active state with frequently used pages =
of running processes or in Inactive state with cached files data.  =
Rarely used parts of processes memory goes to swap.
>=20
>=20


Look at this (top output is sorted by size):

last pid:  2945;  load averages:  8.94,  8.88,  9.23   up 25+20:18:46  =
12:33:26
94 processes:  6 running, 86 sleeping, 2 zombie
CPU: 22.2% user,  0.0% nice,  0.6% system,  0.0% interrupt, 77.2% idle
Mem: 76G Active, 161G Inact, 7485M Wired, 3504M Cache, 1937M Buf, 1906M =
Free
Swap: 24G Total, 1435M Used, 23G Free, 5% Inuse, 12K In, 196K Out

  PID USERNAME      THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU =
COMMAND
 2330 mitya           1  27    0 24611M 24626M piperd 12  10:10  10.25% =
gsort
99508 mitya           1 103    0 15502M 12382M CPU15  15 652:49 100.00% =
mkcls
79062 mitya           1  52    0 11396M 10721M swread 22  69.2H  87.26% =
aliw
80062 mitya           1  52    0 11282M 10666M swread 27  67.0H  80.18% =
aliw
 1832 mitya           1 103    0  8940M  8707M CPU28  28 232:09 100.00% =
aliw
 1871 mitya           1 103    0  8326M  8258M CPU11  11 219:13 100.00% =
aliw
 2329 mitya           1  52    0  5335M  5043M getblk 12 109:49  86.57% =
phraset
 2002 mitya       1  52    0  3810M  3232M wswbuf  3 186:33  98.39% =
phraset
 2035 mitya       1 102    0  3810M  3232M CPU16  16 179:33  98.68% =
phraset
 2555 mitya           1 103    0  2416M  2196M CPU20  20  81:34 100.00% =
aliw
 2038 mitya       1  23    0   150M  4808K piperd 29   0:00   0.00% =
nbest
 2005 mitya       1  22    0   150M  4808K piperd  3   0:00   0.00% =
nbest
 1381 root            2  20    0   106M 23684K select 18   0:57   0.00% =
ruby19
64642 mitya           1  20    0 96608K  1792K select 22   0:37   0.00% =
sshd
 2864 root            1  20    0 92512K  5392K select  6   0:00   0.00% =
sshd
 2866 mitya           1  20    0 92512K  5384K select 18   0:00   0.00% =
sshd
98119 mitya           1  20    0 92512K  2096K select 23   0:07   0.00% =
sshd


This machine has 256GB of RAM and all running processes use less than =
100GB.
But since now all Free memory moved to Inactive state greedily holding =
cached files, we see processes are swapping.

This strategy could be beneficial for file servers, but not for other =
use cases.=