Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 May 2019 10:47:00 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-fs@freebsd.org
Subject:   Re: Commit r345200 (new ARC reclamation threads) looks suspicious to me - second potential problem
Message-ID:  <89064e9c-251a-a065-3a72-ac65c884d51d@denninger.net>
In-Reply-To: <28c7430b-fb7c-6472-5c1b-fa3ff63a9e73@FreeBSD.org>
References:  <369cb1e9-f36a-a558-6941-23b9b811825a@FreeBSD.org> <20190520164202.GA2130@spy> <28c7430b-fb7c-6472-5c1b-fa3ff63a9e73@FreeBSD.org>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On 5/22/2019 10:19 AM, Alexander Motin wrote:
> On 20.05.2019 12:42, Mark Johnston wrote:
>> On Mon, May 20, 2019 at 07:05:07PM +0300, Lev Serebryakov wrote:
>>>   I'm looking at last commit to
>>> 'sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c' (r345200) and
>>> have another question.
>>>
>>>   Here are such code:
>>>
>>> 4960 	        /*
>>> 4961 	         * Kick off asynchronous kmem_reap()'s of all our caches.
>>> 4962 	         */
>>> 4963 	        arc_kmem_reap_soon();
>>> 4964 	
>>> 4965 	        /*
>>> 4966 	         * Wait at least arc_kmem_cache_reap_retry_ms between
>>> 4967 	         * arc_kmem_reap_soon() calls. Without this check it is
>>> possible to
>>> 4968 	         * end up in a situation where we spend lots of time reaping
>>> 4969 	         * caches, while we're near arc_c_min.  Waiting here also
>>> gives the
>>> 4970 	         * subsequent free memory check a chance of finding that the
>>> 4971 	         * asynchronous reap has already freed enough memory, and
>>> we don't
>>> 4972 	         * need to call arc_reduce_target_size().
>>> 4973 	         */
>>> 4974 	        delay((hz * arc_kmem_cache_reap_retry_ms + 999) / 1000);
>>> 4975 	
>>>
>>>   But looks like `arc_kmem_reap_soon()` is synchronous on FreeBSD! So,
>>> this `delay()` looks very wrong. Am I right?
> Why is it wrong?
>
>>>    Looks like it should be `#ifdef illumos`.
>> See also r338142, which I believe was reverted by the update.
> My r345200 indeed reverted that value, but I don't see a problem there.
> When OS need more RAM, pagedaemon will drain UMA caches by itself.  I
> don't see a point in re-draining UMA caches in a tight loop without
> delay.  If caches are not sufficient to sustain one second of workload,
> then usefulness of such caches is not very clear and shrinking ARC to
> free some space may be a right move.  Also making ZFS drain its caches
> more active then anything else in a system looks unfair to me.

There is a long-lasting pathology with the older implementation. The 
short answer is that if you have cache in UMA but unallocated to current 
working set it's completely wasted -- unless quickly re-used.  So a 
small buffer between current and allocation is ok, but the UMA system 
will leave large amounts out but unused. Reclaiming that after a 
reasonable amount of time is a very good thing.

The other problem is that disk cache should NEVER be preferred over 
working set space.  It's always wrong to do so because a working set 
page-out is 1 *guaranteed* I/O (to page it out) and possibly 2 I/Os (if 
required again and thus must be recalled) while a disk cache page is 1 
*possible* I/O avoided (if the disk cache block is requested again)

It is never the right move to intentionally take an I/O in order to 
avoid a *possible* I/O. Under certain workloads making that choice leads 
to severe pathological behavior (~30 second "pauses" where the system is 
doing I/O like crazy but a desired process -- such as a database, or 
shell, does nothing waiting on working set to be paged back in) when 
there are gigabytes (or 10s of gigabytes) of ARC outstanding.

-- 
-- Karl Denninger
/The Market-Ticker/
S/MIME Email accepted and preferred

[-- Attachment #2 --]
0	*H
010
	`He0	*H

00H^Ōc!5
H0
	*H
010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA0
170817164217Z
270815164217Z0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0"0
	*H
0
h-5B>[;olӴ0~͎O9}9Ye*$g!ukvʶLzN`jL>MD'7U45CB+kY`bd~b*c3Ny-78ju]9HeuέsӬDؽmgwER?&UURj'}9nWD i`XcbGz\gG=u%\Oi13ߝ4
K44pYQr]Ie/r0+eEޝݖ0C15Mݚ@JSZ(zȏNTa(25DD5.l<g[[ZarQQ%Buȴ~~`IohRbʳڟu2MS8EdFUClCMaѳ!}ș+2k/bųE,n当ꖛ\(8WV8	d]b	yXw	܊:I39
00U]^§Q\ӎ0U#0T039N0b010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA	@Ui0U00U0
	*H
:P U!>vJnio-#ן]WyujǑR̀Q
nƇ!GѦFg\yLxgw=OPycehf[}ܷ['4ڝ\[p6\o.B&JF"ZC{;*o*mcCcLY߾`
t*S!񫶭(`]DHP5A~/NPp6=mhk밣'doA$86hm5ӚS@jެEgl
)0JG`%k35PaC?σ
׳HEt}!P㏏%*BxbQwaKG$6h¦Mve;[o-Iی&
I,Tcߎ#t wPA@l0P+KXBպT	zGv;NcI3&JĬUPNa?/%W6G۟N000k#Xd\=0
	*H
0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0
170817212120Z
220816212120Z0W10	UUS10UFlorida10U
Cuda Systems LLC10Ukarl@denninger.net0"0
	*H
0
T[I-ΆϏdn;Å@שy.us~_ZG%<MYd\gvfnsa1'6Egyjs"C [{~_KPn+<*pv#Q+H/7[-vqDV^U>f%GX)H.|l`M(Cr>е͇6#odc"YljҦln8@5SA0&ۖ"OGj?UDWZ5	dDB7k-)9Izs-JAv
J6L$Ն1SmY.Lqw*SH;EF'DĦH]MOgQQ|Mٙג2Z9y@y]}6ٽeY9Y2xˆ$T=eCǺǵbn֛{j|@LLt1[Dk5:$=	`	M00<+00.0,+0 http://ocsp.cudasystems.net:88880	U00	`HB0U0U%0++03	`HB
&$OpenSSL Generated Client Certificate0U%՞V=؁;bzQ0U#0]^§Q\ӎϡ010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CAH^Ōc!5
H0U0karl@denninger.net0
	*H
۠A0-j%--$%g2#ޡ1^>{K+uGEv1ş7Af&b&O;.;A5*U)ND2bF|\=]<sˋL!wrw٧>YMÄ3\mWR hSv!_zvl? 3_ xU%\^#O*Gk̍YI_&Fꊛ@&1n”} ͬ:{hTP3B.;bU8:Z=^Gw8!k-@xE@i,+'Iᐚ:fhztX7/(hY` O.1}a`%RW^akǂpCAufgDixUTЩ/7}%=jnVZvcF<M=
2^GKH5魉
_O4ެByʈySkw=5@h.0z>
W1000{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0
	`HeE0	*H
	1	*H
0	*H
	1
190522154700Z0O	*H
	1B@\=7RikU,/M{tU'Ps';웦	Ο"FR?gX͚0l	*H
	1_0]0	`He*0	`He0
*H
0*H
0
*H
@0+0
*H
(0	+7100{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0*H
	10{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0
	*H
5A~
*S}1×85{F45%CVe9ʼ[h~{?W 5Fن{#Łt`F0iV~kz{$JV0aQC%J3!JQܽ"h#Te^e鵓P s>&z6Hɐks7we%0Y$wW%=ԈAU%4bFyXyO`%>Y;fsTz@Z`E;1F/
O?>Y<+{ωojdF5.nɽG">L(ϗ5n/%I1Y
*x<i->+Za0Fd[G&pBǔF'\vfYT,II`Kvk̵
A,,Y~7iHSw<yPJנR5 j[}@.gV9k, C NXd7
help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?89064e9c-251a-a065-3a72-ac65c884d51d>