Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Jul 2016 09:30:51 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-hackers@freebsd.org
Subject:   Re: ZFS ARC and mmap/page cache coherency question
Message-ID:  <d00e30d1-7c99-8f30-835e-705fbf3d00e8@denninger.net>
In-Reply-To: <272d657a-52ae-4f45-008c-3de6fb1b0c48@freebsd.org>
References:  <20160630140625.3b4aece3@splash.akips.com> <CALXu0UfxRMnaamh%2Bpo5zp=iXdNUNuyj%2B7e_N1z8j46MtJmvyVA@mail.gmail.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org> <768b6169-70d9-5500-c455-563d8340972e@denninger.net> <b03f73a1-95c9-c753-3464-74fcb45351e5@freebsd.org> <ec4685b2-bdaf-c18d-8aff-38b17edf4ebb@denninger.net> <272d657a-52ae-4f45-008c-3de6fb1b0c48@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]

On 7/4/2016 22:01, Allan Jude wrote:
> On 2016-07-04 22:46, Karl Denninger wrote:
>>
>>> You keep saying per zvol. Do you mean per vdev? I am under the
>>> impression that no zvol's are involved in the use case this thread is
>>> about.
>> Sorry, per-vdev.  The problem with dmu_tx is that it's system-wide.
>> This is wildly inappropriate for several reasons -- first, it is
>> computed on size-of-RAM with a hard cap (which is stupid on its face)
>> and it entirely insensitive to the performance of the vdev's in
>> question.  Specifically, it is very common for a system to have very
>> fast (e.g. SSD) disks, perhaps in a mirror configuration, and then
>> spinning rust in a RaidZ2 config for bulk storage.  Those are very, very
>> different performance wise and they should have wildly different
>> write-back cache sizes.  At present there is exactly one such write-back
>> cache and it's both system-wide and pays exactly zero attention to the
>> throughput of the underlying vdevs it is talking to.
>>
>> This is why you can provoke minute-long stalls on a system with moderate
>> (e.g. 32GB) amounts of RAM if there are spinning rust devices in the
>> configuration.
>>
>>>
>>> Improving the way ZFS frees memory, specifically UMA and the 'kmem
>>> caches' will help a lot as well.
>>>
>> Well, yeah.  But that means you have to police up the size of the UMA
>> .vs. how much is actually in use in the UMA.  What the PR does is get
>> pretty aggressive with that whenever RAM is tight, and before the pager
>> can start playing hell with system performance.
>>
>>> In addition, another patch just went in to allow you to change the
>>> arc_max and arc_min on a running system.
>>>
>> Yes, the PR I did a long time ago made that "active" on a running
>> system.... so I've had that for quite some time.  Not that you really
>> ought to need to play with that (if you feel a need to then you're still
>> at step 1 or 2 of what I went through with analyzing and working on this
>> in the 10.x code.....)
>>
>
> Have you looked into the the ZFS 'Write Throttle', it seems like it
> was meant to solve the writeback problem you are describing. It starts
> sending back pressure up to the application by introducing larger and
> larger delays in the write() call until your disks can keep up with
> your applications.
>
> http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/
>
> http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/
>

I believe this has been brought into FreeBSD's implementation; I recall
going through it.

-- 
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

[-- Attachment #2 --]
0	*H
010
	`He0	*H
_0[0C)0
	*H
010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*H
	Cuda Systems LLC CA0
150421022159Z
200419022159Z0Z10	UUS10UFlorida10U
Cuda Systems LLC10UKarl Denninger (OCSP)0"0
	*H
0
X@vkY
Tq/vE]5#֯MX\8LJ/V?5Da+
sJc*/r{ȼnS+w")ąZ^DtdCOZ ~7Q '@a#ijc۴oZdB&!Ӝ-<	?HN5y
5}F|ef゘"Vلio74zn">a1qWuɖbFeGE&3(KhixG3!#e_XƬϜ/,$+;4y'Bz<qT9_?rRUpn5
Jn&Rx/p Jyel*pN8/#9u/YPEC)TY>~/˘N[vyiDKˉ,^" ?$T8v&K%z8C @?K{9f`+@,|Mbia007++0)0'+0http://cudasystems.net:88880	U00	`HB0U0,	`HB
OpenSSL Generated Certificate0U-h\Ff Y0U#0$q}ݽʒm50U0karl@denninger.net0
	*H
Owbabɺx&Uk[(Oj!%pMQ0I!#QH}.>~2&D}<wm_>V6v]f>=Nn+8;q wfΰ/RLyUG#b}n!Dր_up|_ǰc/%ۥ
nN8:d;-UJd/m1~VނיnN I˾$tF1&}|?q?\đXԑ&\4V<lKۮ3%Am_(q-(cAeGX)f}-˥6cv~Kg8m~v;|9:-iAPқ6ېn-.)<[$KJtt/L4ᖣ^Cmu4vb{+BG$M0c\[MR|0FԸP&78"4p#}DZ9;V9#>Sw"[UP7100010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*H
	Cuda Systems LLC CA)0
	`HeM0	*H
	1	*H
0	*H
	1
160705143051Z0O	*H
	1B@ur֑[$mrAlF\򻎬 B}r_|B|,h/@ϑ1.M0l	*H
	1_0]0	`He*0	`He0
*H
0*H
0
*H
@0+0
*H
(0	+710010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*H
	Cuda Systems LLC CA)0*H
	1010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*H
	Cuda Systems LLC CA)0
	*H
j@&B*[ha7R>%ez"/+(R?ׁK[Pa3,*׫IDX~8ʣM[GqL^{r=-eZv uX藪+%Mżc8{:0lЗY-"-S̒áE}7~{¦|q6P'KztqZVy02s?_C@
FvY|
w덐٭U8ɋ$[ʈMO\7߽L^A)T~TTjz	lD]~&3~_݄wǿ#<aWൾ$oN:<'*ش:Ji}&njȊbE8*ЄQ'@~
Z7'PMk/v>JE5wѡ.U74򏹪v
QrU? ^e@ I[DZ

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d00e30d1-7c99-8f30-835e-705fbf3d00e8>