Date: Sat, 20 Aug 2016 11:08:44 -0500 From: Karl Denninger <karl@denninger.net> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS ARC under memory pressure Message-ID: <97f166f0-4d47-d5a3-ecb3-d15f1ecf9c1f@denninger.net> In-Reply-To: <20160820152225.GP83214@kib.kiev.ua> References: <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> <c3bc6c5a-961c-e3a4-2302-f0f7417bc34f@denninger.net> <20160819201840.GA12519@zxy.spb.ru> <bcb14d0b-bd6d-cb93-ea71-3656cfce8b3b@denninger.net> <20160820152225.GP83214@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
On 8/20/2016 10:22, Konstantin Belousov wrote:
> On Fri, Aug 19, 2016 at 03:38:55PM -0500, Karl Denninger wrote:
>> Paging *always* requires one I/O (to write the page(s) to the swap) and
>> MAY involve two (to later page it back in.) It is never a "win" to
>> spend a *guaranteed* I/O when you can instead act in a way that *might*
>> cause you to (later) need to execute one.
> Why would pagedaemon need to write out clean page ?
If you are talking about the case of an executable in which part of the
text is evicted you are correct, however, you are still choosing in that
instance to evict a page for which there will likely be a future demand
and thus require an I/O (should that executable come back up for
execution) as opposed to one for which you have no idea how likely
demand for same will be (a data page in the ARC.)
Since the VM has no means of "coloring" the ARC (as it is opaque other
than the consumption of system memory to the VM) as to how "useful"
(e.g. how often used, etc) a particular data item in the ARC is, it has
no information available on which to decide. However, the fact that an
executing process is in some sort of waiting state still likely trumps
an ARC data page in terms of likelihood of future access.
root@NewFS:/usr/src/sys/amd64/conf # pstat -s
Device 1K-blocks Used Avail Capacity
/dev/mirror/sw.eli 67108860 291356 66817504 0%
While this is not a large amount of page space used I can assure you
that at no time since boot was all 32GB of memory in the machine
consumed with other-than-ARC data. As such for the VM system to have
decided to evict pages to the swap file rather than the ARC be pared
back is demonstrably wrong since the result was the execution of I/Os on
the *speculative* bet that a page in the ARC would preferentially be
required.
On 10.x, unpatched, there were fairly trivial "added" workload choices
that one might make on a routine basis (e.g. "make -j8 buildworld") on
this machine that, if you had a largish text file open in "vi", would
lead to user-perceived stalls exceeding 10 seconds in length during
which that process's working set had been evicted so as to keep ARC
cache data! While it might at first blush appear that the Postgres
database consumers on the same machine would be happy with this when
*their* RSS got paged out and *they* took the resulting 10+ second stall
as well that certainly was not the case!
11.x does exhibit far less pathology in this regard than did 10.x
(unpatched) and I've yet to see the "stall system to the point that it
appears it has crashed" behavior that I formerly could provoke with a
trivial test.
However, the fact remains that the same machine, with the same load,
running 10.x and my patches ran for months at a time with zero page
space consumed, a fully-utilized ARC and very little slack space
(defined as RAM in "Cache" + allocated-but-unused UMA) -- in other
words, with no displayed pathology at all.
The behavior of unpatched 11.x, while very-materially better than
unpatched 10.x, IMHO does not meet this standard. In particular there
are quite-large quantities of UMA space out-but-unused on a regular basis
and while *at present* the ARC looks pretty healthy this is a weekend
when system load is quite low. During the week not only does the UMA
situation look far worse so does the ARC size and efficiency which
frequently winds up running at "half-mast" compared to where it ought to be.
I believe FreeBSD 11.x can do better and intend to roll forward the 10.x
work in an attempt to implement that.
--
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
[-- Attachment #2 --]
0 *H
010
`He 0 *H
_0[0C)0
*H
010 UUS10UFlorida10U Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 *H
Cuda Systems LLC CA0
150421022159Z
200419022159Z0Z10 UUS10UFlorida10U
Cuda Systems LLC10UKarl Denninger (OCSP)0"0
*H
0
X@vkY
Tq/vE]5#֯MX\8LJ/V?5Da+
sJc*/r{ȼnS+ w")ąZ^DtdCOZ ~7Q '@a#ijc۴oZdB&!Ӝ-< ?HN5y
5}F|ef"Vلio74zn">a1qWuɖbFeGE&3(KhixG3!#e_XƬϜ/,$+;4y'Bz<qT9_?rRUpn5
Jn&Rx/p Jyel*pN8/#9u/YPEC)TY>~/˘N[vyiDKˉ,^" ?$T8 v&K%z8C @?K{9f`+@,|Mbia 007++0)0'+0http://cudasystems.net:88880 U0 0 `HB0U0, `HB
OpenSSL Generated Certificate0U-h\Ff Y0U#0$q}ݽʒm50U0karl@denninger.net0
*H
Owbabɺx&Uk[(Oj!%p MQ0I!#QH}.>~2&D}<wm_>V6v]f>=Nn+8;q wfΰ/RLyUG#b}n!Dր_up|_ǰc/%ۥ
nN8:d;-UJd/m1~VނיnN I˾$tF1&}|?q?\đXԑ&\4V<lKۮ3%Am_(q-(cAeGX)f}-˥6cv~Kg8m~v;|9:-iAPқ6ېn-.)<[$KJtt/L4ᖣ^Cmu4vb{+BG$M0c\[MR|0FԸP&78"4p#}DZ9;V9#>Sw"[UP7100010 UUS10UFlorida10U Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 *H
Cuda Systems LLC CA)0
`He M0 *H
1 *H
0 *H
1
160820160844Z0O *H
1B@f_H"sQG*)1Lݟu0)/Hj3:D#˃4KI„|0l *H
1_0]0 `He*0 `He0
*H
0*H
0
*H
@0+0
*H
(0 +710010 UUS10UFlorida10U Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 *H
Cuda Systems LLC CA)0*H
1010 UUS10UFlorida10U Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 *H
Cuda Systems LLC CA)0
*H
)}E!S. $}:N5Cr4k}&I`X0{i4ȼ<Vp9ʆ!}DC t7N4^kc @'zN7FeQV#Dg~q~m9A~X+@<B`n)bӴM|&-¨iaMDW۴KM)Qq^d#?!Dm\!*̴}XP֙v}m
U<2>KCm@) zpZ?0i/:̲V(=)M>BX`jK
=$Yql])
kv챏lu*ǸRBJ)
6Z)N^-~ks2"'x7`'TzT9red&,P|@X_˯Nݚ2@0l0ҧJo:FZ`sJVՇIv=ZvqWm:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?97f166f0-4d47-d5a3-ecb3-d15f1ecf9c1f>
