Date: Sun, 6 Apr 2014 12:37:57 +0400 From: Dmitry Sivachenko <trtrmitya@gmail.com> To: John Baldwin <jhb@FreeBSD.org> Cc: freebsd-hackers@freebsd.org Subject: Re: madvise() vs posix_fadvise() Message-ID: <00B9699B-80D2-40E6-AA51-7B15191A4BDE@gmail.com> In-Reply-To: <8DAE3175-FE32-4D17-A386-063DDB6C45F7@gmail.com> References: <D6BD48AF-9522-495D-8D54-37854E53C272@gmail.com> <201404031102.38598.jhb@freebsd.org> <EF134BCA-1E92-4C98-8763-9A31EA96839A@gmail.com> <201404041612.35889.jhb@freebsd.org> <5426E303-E35B-4D4A-AB62-3571228A5A2C@gmail.com> <8DAE3175-FE32-4D17-A386-063DDB6C45F7@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 06 =D0=B0=D0=BF=D1=80. 2014 =D0=B3., at 0:11, Dmitry Sivachenko = <trtrmitya@gmail.com> wrote: >=20 > On 05 =D0=B0=D0=BF=D1=80. 2014 =D0=B3., at 1:02, Dmitry Sivachenko = <trtrmitya@gmail.com> wrote: >=20 >> On 05 =D0=B0=D0=BF=D1=80. 2014 =D0=B3., at 0:12, John Baldwin = <jhb@FreeBSD.org> wrote: >>=20 >>>=20 >>> MADV_WILLNEED is not going to give you what you want. OTOH, if you = haven't >>> tried FreeBSD 10 yet, I would suggest trying that. There have been = changes >>> to pagedaemon that might make it do a better job of kicking out the = pages >>> of the log files automatically. >>>=20 >>=20 >>=20 >> I did. My situation became worse after I moved from stable/9 to = stable/10. >> My feeling is that stable/10 pushes rarely used mmaped pages out of = RAM more aggressively than stable/9 did. >>=20 >> For now, the only solution I found is doing msync(MS_INVALIDATE) on = log files after gzipping and after backup via rsync. >> This moves corresponding memory pages from Inactive to Free and = prevents system to occupy all free memory with cached log files and to = purge mmaped data out of RAM to accomodate more disk cache. >>=20 >> What I would love to see is an ability to tell OS not to release = mmaped data unless "really needed" (disk cache is not an excuse). >=20 >=20 > One more observation as it seems to be related. > If my program allocates RAM via malloc() rather than mmap(), I see = that VM swaps rarely used parts of malloced data out as disk is being = used > (more and more memory goes to Inactive with cached files content). >=20 > This is also different from stable/9 and seems not good. Why to keep = cached content of files forever? (seems there is no timeout for keeping = cached files content in Inactive state). So after few days of uptime = all available RAM is either in Active state with frequently used pages = of running processes or in Inactive state with cached files data. = Rarely used parts of processes memory goes to swap. >=20 >=20 Look at this (top output is sorted by size): last pid: 2945; load averages: 8.94, 8.88, 9.23 up 25+20:18:46 = 12:33:26 94 processes: 6 running, 86 sleeping, 2 zombie CPU: 22.2% user, 0.0% nice, 0.6% system, 0.0% interrupt, 77.2% idle Mem: 76G Active, 161G Inact, 7485M Wired, 3504M Cache, 1937M Buf, 1906M = Free Swap: 24G Total, 1435M Used, 23G Free, 5% Inuse, 12K In, 196K Out PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU = COMMAND 2330 mitya 1 27 0 24611M 24626M piperd 12 10:10 10.25% = gsort 99508 mitya 1 103 0 15502M 12382M CPU15 15 652:49 100.00% = mkcls 79062 mitya 1 52 0 11396M 10721M swread 22 69.2H 87.26% = aliw 80062 mitya 1 52 0 11282M 10666M swread 27 67.0H 80.18% = aliw 1832 mitya 1 103 0 8940M 8707M CPU28 28 232:09 100.00% = aliw 1871 mitya 1 103 0 8326M 8258M CPU11 11 219:13 100.00% = aliw 2329 mitya 1 52 0 5335M 5043M getblk 12 109:49 86.57% = phraset 2002 mitya 1 52 0 3810M 3232M wswbuf 3 186:33 98.39% = phraset 2035 mitya 1 102 0 3810M 3232M CPU16 16 179:33 98.68% = phraset 2555 mitya 1 103 0 2416M 2196M CPU20 20 81:34 100.00% = aliw 2038 mitya 1 23 0 150M 4808K piperd 29 0:00 0.00% = nbest 2005 mitya 1 22 0 150M 4808K piperd 3 0:00 0.00% = nbest 1381 root 2 20 0 106M 23684K select 18 0:57 0.00% = ruby19 64642 mitya 1 20 0 96608K 1792K select 22 0:37 0.00% = sshd 2864 root 1 20 0 92512K 5392K select 6 0:00 0.00% = sshd 2866 mitya 1 20 0 92512K 5384K select 18 0:00 0.00% = sshd 98119 mitya 1 20 0 92512K 2096K select 23 0:07 0.00% = sshd This machine has 256GB of RAM and all running processes use less than = 100GB. But since now all Free memory moved to Inactive state greedily holding = cached files, we see processes are swapping. This strategy could be beneficial for file servers, but not for other = use cases.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00B9699B-80D2-40E6-AA51-7B15191A4BDE>