From owner-freebsd-hackers@FreeBSD.ORG Sun Apr 6 08:38:04 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6EE0F1F1; Sun, 6 Apr 2014 08:38:04 +0000 (UTC) Received: from mail-la0-x22d.google.com (mail-la0-x22d.google.com [IPv6:2a00:1450:4010:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BC2566EF; Sun, 6 Apr 2014 08:38:03 +0000 (UTC) Received: by mail-la0-f45.google.com with SMTP id hr17so3779259lab.18 for ; Sun, 06 Apr 2014 01:38:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=YCJHs6lB5KnWNzozBH5kCHf8NseZMTtr9Lu0LPWsHM4=; b=0TqFoorZh/P4/GrtgfkdK3AeDZPHVEp+FHjUdvO/yqr7KdgqSli+bxwFm3CCoQ0j7t L2SQFZz2WrG0S9tVyaRm4uckAb9H5k0ch1A53OMqRRH/fK58BmqltSrcjMitYQZVRcYT dcnwwVbt1+ZTlvhCcNuIOUQd0CaBqOYUINkrhx7WL/8WReqrVTzR/oQiu5NHdFk6UEKB /oBNt72Rbt3c3S2HGe9galAiSt5tHtaF2Zo/+qUAgcMMGfJCW+DFfV6uZ2/HZnrVoFhh /k0e9J0BQr3ek3FN9lDGT07TIgiLS1eluzjbH4QDfl/RU3hNNilFMhWLvSSM1WKtOygr mXTA== X-Received: by 10.112.51.202 with SMTP id m10mr26198lbo.63.1396773481617; Sun, 06 Apr 2014 01:38:01 -0700 (PDT) Received: from [10.0.1.9] ([176.193.27.55]) by mx.google.com with ESMTPSA id wm1sm13121602lac.14.2014.04.06.01.37.59 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 06 Apr 2014 01:37:59 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: madvise() vs posix_fadvise() From: Dmitry Sivachenko In-Reply-To: <8DAE3175-FE32-4D17-A386-063DDB6C45F7@gmail.com> Date: Sun, 6 Apr 2014 12:37:57 +0400 Content-Transfer-Encoding: quoted-printable Message-Id: <00B9699B-80D2-40E6-AA51-7B15191A4BDE@gmail.com> References: <201404031102.38598.jhb@freebsd.org> <201404041612.35889.jhb@freebsd.org> <5426E303-E35B-4D4A-AB62-3571228A5A2C@gmail.com> <8DAE3175-FE32-4D17-A386-063DDB6C45F7@gmail.com> To: John Baldwin X-Mailer: Apple Mail (2.1874) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Apr 2014 08:38:04 -0000 On 06 =D0=B0=D0=BF=D1=80. 2014 =D0=B3., at 0:11, Dmitry Sivachenko = wrote: >=20 > On 05 =D0=B0=D0=BF=D1=80. 2014 =D0=B3., at 1:02, Dmitry Sivachenko = wrote: >=20 >> On 05 =D0=B0=D0=BF=D1=80. 2014 =D0=B3., at 0:12, John Baldwin = wrote: >>=20 >>>=20 >>> MADV_WILLNEED is not going to give you what you want. OTOH, if you = haven't >>> tried FreeBSD 10 yet, I would suggest trying that. There have been = changes >>> to pagedaemon that might make it do a better job of kicking out the = pages >>> of the log files automatically. >>>=20 >>=20 >>=20 >> I did. My situation became worse after I moved from stable/9 to = stable/10. >> My feeling is that stable/10 pushes rarely used mmaped pages out of = RAM more aggressively than stable/9 did. >>=20 >> For now, the only solution I found is doing msync(MS_INVALIDATE) on = log files after gzipping and after backup via rsync. >> This moves corresponding memory pages from Inactive to Free and = prevents system to occupy all free memory with cached log files and to = purge mmaped data out of RAM to accomodate more disk cache. >>=20 >> What I would love to see is an ability to tell OS not to release = mmaped data unless "really needed" (disk cache is not an excuse). >=20 >=20 > One more observation as it seems to be related. > If my program allocates RAM via malloc() rather than mmap(), I see = that VM swaps rarely used parts of malloced data out as disk is being = used > (more and more memory goes to Inactive with cached files content). >=20 > This is also different from stable/9 and seems not good. Why to keep = cached content of files forever? (seems there is no timeout for keeping = cached files content in Inactive state). So after few days of uptime = all available RAM is either in Active state with frequently used pages = of running processes or in Inactive state with cached files data. = Rarely used parts of processes memory goes to swap. >=20 >=20 Look at this (top output is sorted by size): last pid: 2945; load averages: 8.94, 8.88, 9.23 up 25+20:18:46 = 12:33:26 94 processes: 6 running, 86 sleeping, 2 zombie CPU: 22.2% user, 0.0% nice, 0.6% system, 0.0% interrupt, 77.2% idle Mem: 76G Active, 161G Inact, 7485M Wired, 3504M Cache, 1937M Buf, 1906M = Free Swap: 24G Total, 1435M Used, 23G Free, 5% Inuse, 12K In, 196K Out PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU = COMMAND 2330 mitya 1 27 0 24611M 24626M piperd 12 10:10 10.25% = gsort 99508 mitya 1 103 0 15502M 12382M CPU15 15 652:49 100.00% = mkcls 79062 mitya 1 52 0 11396M 10721M swread 22 69.2H 87.26% = aliw 80062 mitya 1 52 0 11282M 10666M swread 27 67.0H 80.18% = aliw 1832 mitya 1 103 0 8940M 8707M CPU28 28 232:09 100.00% = aliw 1871 mitya 1 103 0 8326M 8258M CPU11 11 219:13 100.00% = aliw 2329 mitya 1 52 0 5335M 5043M getblk 12 109:49 86.57% = phraset 2002 mitya 1 52 0 3810M 3232M wswbuf 3 186:33 98.39% = phraset 2035 mitya 1 102 0 3810M 3232M CPU16 16 179:33 98.68% = phraset 2555 mitya 1 103 0 2416M 2196M CPU20 20 81:34 100.00% = aliw 2038 mitya 1 23 0 150M 4808K piperd 29 0:00 0.00% = nbest 2005 mitya 1 22 0 150M 4808K piperd 3 0:00 0.00% = nbest 1381 root 2 20 0 106M 23684K select 18 0:57 0.00% = ruby19 64642 mitya 1 20 0 96608K 1792K select 22 0:37 0.00% = sshd 2864 root 1 20 0 92512K 5392K select 6 0:00 0.00% = sshd 2866 mitya 1 20 0 92512K 5384K select 18 0:00 0.00% = sshd 98119 mitya 1 20 0 92512K 2096K select 23 0:07 0.00% = sshd This machine has 256GB of RAM and all running processes use less than = 100GB. But since now all Free memory moved to Inactive state greedily holding = cached files, we see processes are swapping. This strategy could be beneficial for file servers, but not for other = use cases.=