Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Jul 2016 12:50:16 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-hackers@freebsd.org
Subject:   Re: ZFS ARC and mmap/page cache coherency question
Message-ID:  <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net>
In-Reply-To: <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>
References:  <20160630140625.3b4aece3@splash.akips.com> <CALXu0UfxRMnaamh%2Bpo5zp=iXdNUNuyj%2B7e_N1z8j46MtJmvyVA@mail.gmail.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]

On 7/5/2016 12:19, Matthew Macy wrote:
>
>
>  ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger <karl@denninger.net> wrote ---- 
>  >  
>  >  
>  > On 7/4/2016 18:45, Matthew Macy wrote: 
>  > > 
>  > > 
>  > >  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@denninger.net> wrote ----  
>  > >  >   
>  > >  > On 7/3/2016 02:45, Matthew Macy wrote:  
>  > >  > >           
>  > >  > >             Cedric greatly overstates the intractability of resolving it. Nonetheless, since the initial import very little has been done to improve integration, and I don't know of anyone who is up to the task taking an interest in it. Consequently, mmap() performance is likely "doomed" for the foreseeable future.-M----   
>  > >  >   
>  > >  > Wellllll....  
>  > >  >   
>  > >  > I've done a fair bit of work here (see  
>  > >  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the  
>  > >  > political issues are at least as bad as the coding ones.  
>  > >  >   
>  > >   
>  > > 
>  > > Strictly speaking, the root of the problem is the ARC. Not ZFS per se. Have you ever tried disabling MFU caching to see how much worse LRU only is? I'm not really convinced the ARC's benefits justify its cost. 
>  > > 
>  > > -M 
>  > > 
>  >  
>  > The ARC is very useful when it gets a hit as it avoid an I/O that would 
>  > otherwise take place. 
>  >  
>  > Where it sucks is when the system evicts working set to preserve ARC.  
>  > That's always wrong in that you're trading a speculative I/O (if the 
>  > cache is hit later) for a *guaranteed* one (to page out) and maybe *two* 
>  > (to page back in.) 
>  
> The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. There are a lot of issues stemming from the fact that ZFS is a transactional object store with a POSIX FS on top. One is that it caches disk blocks as opposed to file blocks. However, if one could resolve that and have the page cache manage these blocks life would be much much better. However, you'd lose MFU. Hence my question.
>
> -M
>
I suspect there's an argument to be made there but the present problems
make determining the impact of that difficult or impossible as those
effects are swamped by the other issues.

I can fairly-easily create workloads on the base code where simply
typing "vi <some file>", making a change and hitting ":w" will result in
a stall of tens of seconds or more while the cache flush that gets
requested is run down.  I've resolved a good part (but not all
instances) of this through my work.

My understanding is that 11- has had additional work done to the base
code, but three underlying issues are not, from what I can see in the
commit logs and discussions, addressed: The VM system will page out
working set while leaving ARC alone, UMA reserved-but-not-in-use space
is not policed adequately when memory pressure exists *before* the pager
starts considering evicting working set and the write-back cache is for
many machine configurations grossly inappropriate and cannot be tuned
adequately by hand (particularly being true on a system with vdevs that
have materially-varying performance levels.)

I have more-or-less stopped work on the tree on a forward basis since I
got to a place with 10.2 that (1) works for my production requirements,
resolving the problems and (2) ran into what I deemed to be intractable
political issues within core on progress toward eradicating the root of
the problem.

I will probably revisit the situation with 11- at some point, as I'll
want to roll my production systems forward.  However, I don't know when
that will be -- right now 11- is stable enough for some of my embedded
work (e.g. on the Raspberry Pi2) but is not on my server and
client-class machines.  Indeed just yesterday I got a lock-order
reversal panic while doing a shutdown after a kernel update on one of my
lab boxes running a just-updated 11- codebase.

-- 
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

[-- Attachment #2 --]
0	*H
010
	`He0	*H
_0[0C)0
	*H
010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*H
	Cuda Systems LLC CA0
150421022159Z
200419022159Z0Z10	UUS10UFlorida10U
Cuda Systems LLC10UKarl Denninger (OCSP)0"0
	*H
0
X@vkY
Tq/vE]5#֯MX\8LJ/V?5Da+
sJc*/r{ȼnS+w")ąZ^DtdCOZ ~7Q '@a#ijc۴oZdB&!Ӝ-<	?HN5y
5}F|ef゘"Vلio74zn">a1qWuɖbFeGE&3(KhixG3!#e_XƬϜ/,$+;4y'Bz<qT9_?rRUpn5
Jn&Rx/p Jyel*pN8/#9u/YPEC)TY>~/˘N[vyiDKˉ,^" ?$T8v&K%z8C @?K{9f`+@,|Mbia007++0)0'+0http://cudasystems.net:88880	U00	`HB0U0,	`HB
OpenSSL Generated Certificate0U-h\Ff Y0U#0$q}ݽʒm50U0karl@denninger.net0
	*H
Owbabɺx&Uk[(Oj!%pMQ0I!#QH}.>~2&D}<wm_>V6v]f>=Nn+8;q wfΰ/RLyUG#b}n!Dր_up|_ǰc/%ۥ
nN8:d;-UJd/m1~VނיnN I˾$tF1&}|?q?\đXԑ&\4V<lKۮ3%Am_(q-(cAeGX)f}-˥6cv~Kg8m~v;|9:-iAPқ6ېn-.)<[$KJtt/L4ᖣ^Cmu4vb{+BG$M0c\[MR|0FԸP&78"4p#}DZ9;V9#>Sw"[UP7100010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*H
	Cuda Systems LLC CA)0
	`HeM0	*H
	1	*H
0	*H
	1
160705175016Z0O	*H
	1B@~mYMX4A5jH
l댳J\o?ĉ@3w<
maI0l	*H
	1_0]0	`He*0	`He0
*H
0*H
0
*H
@0+0
*H
(0	+710010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*H
	Cuda Systems LLC CA)0*H
	1010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*H
	Cuda Systems LLC CA)0
	*H
Bl;҂CBK[l<,D{Q\m0O&Fxpwp#ij
LBO{|]|9K,ӠIMX{>c=M:q~Gx+zQ+ڮӡ׃
e!5 nhra@*t=xv.:ަvnJHړ$P?O%j"xqʓESqTDM5z]!\P+V?-?*_қHyo[kU1'u|4'D.XC4#>FjN[db8
ΠK$T
s,">0$nM?b Xj%W,dA1tQbtd."5:yW+éaMO\ah$˓T+(f*dhMMO!ImZ:a7µuCz&^qg4)*()qHJ7t*

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?31f4d30f-4170-0d04-bd23-1b998474a92e>