Date: Tue, 5 Jul 2016 09:30:51 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-hackers@freebsd.org Subject: Re: ZFS ARC and mmap/page cache coherency question Message-ID: <d00e30d1-7c99-8f30-835e-705fbf3d00e8@denninger.net> In-Reply-To: <272d657a-52ae-4f45-008c-3de6fb1b0c48@freebsd.org> References: <20160630140625.3b4aece3@splash.akips.com> <CALXu0UfxRMnaamh%2Bpo5zp=iXdNUNuyj%2B7e_N1z8j46MtJmvyVA@mail.gmail.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org> <768b6169-70d9-5500-c455-563d8340972e@denninger.net> <b03f73a1-95c9-c753-3464-74fcb45351e5@freebsd.org> <ec4685b2-bdaf-c18d-8aff-38b17edf4ebb@denninger.net> <272d657a-52ae-4f45-008c-3de6fb1b0c48@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] On 7/4/2016 22:01, Allan Jude wrote: > On 2016-07-04 22:46, Karl Denninger wrote: >> >>> You keep saying per zvol. Do you mean per vdev? I am under the >>> impression that no zvol's are involved in the use case this thread is >>> about. >> Sorry, per-vdev. The problem with dmu_tx is that it's system-wide. >> This is wildly inappropriate for several reasons -- first, it is >> computed on size-of-RAM with a hard cap (which is stupid on its face) >> and it entirely insensitive to the performance of the vdev's in >> question. Specifically, it is very common for a system to have very >> fast (e.g. SSD) disks, perhaps in a mirror configuration, and then >> spinning rust in a RaidZ2 config for bulk storage. Those are very, very >> different performance wise and they should have wildly different >> write-back cache sizes. At present there is exactly one such write-back >> cache and it's both system-wide and pays exactly zero attention to the >> throughput of the underlying vdevs it is talking to. >> >> This is why you can provoke minute-long stalls on a system with moderate >> (e.g. 32GB) amounts of RAM if there are spinning rust devices in the >> configuration. >> >>> >>> Improving the way ZFS frees memory, specifically UMA and the 'kmem >>> caches' will help a lot as well. >>> >> Well, yeah. But that means you have to police up the size of the UMA >> .vs. how much is actually in use in the UMA. What the PR does is get >> pretty aggressive with that whenever RAM is tight, and before the pager >> can start playing hell with system performance. >> >>> In addition, another patch just went in to allow you to change the >>> arc_max and arc_min on a running system. >>> >> Yes, the PR I did a long time ago made that "active" on a running >> system.... so I've had that for quite some time. Not that you really >> ought to need to play with that (if you feel a need to then you're still >> at step 1 or 2 of what I went through with analyzing and working on this >> in the 10.x code.....) >> > > Have you looked into the the ZFS 'Write Throttle', it seems like it > was meant to solve the writeback problem you are describing. It starts > sending back pressure up to the application by introducing larger and > larger delays in the write() call until your disks can keep up with > your applications. > > http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/ > > http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/ > I believe this has been brought into FreeBSD's implementation; I recall going through it. -- Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ [-- Attachment #2 --] 0 *H 010 `He 0 *H _0[0C)0 *H 010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA0 150421022159Z 200419022159Z0Z10 UUS10UFlorida10U Cuda Systems LLC10UKarl Denninger (OCSP)0"0 *H 0 X@vkY Tq/vE]5#֯MX\8LJ/V?5Da+ sJc*/r{ȼnS+ w")ąZ^DtdCOZ ~7Q '@a#ijc۴oZdB&!Ӝ-< ?HN5y 5}F|ef"Vلio74zn">a1qWuɖbFeGE&3(KhixG3!#e_XƬϜ/,$+;4y'Bz<qT9_?rRUpn5 Jn&Rx/p Jyel*pN8/#9u/YPEC)TY>~/˘N[vyiDKˉ,^" ?$T8 v&K%z8C @?K{9f`+@,|Mbia 007++0)0'+0http://cudasystems.net:88880 U0 0 `HB0U0, `HB OpenSSL Generated Certificate0U-h\Ff Y0U#0$q}ݽʒm50U0karl@denninger.net0 *H Owbabɺx&Uk[(Oj!%p MQ0I!#QH}.>~2&D}<wm_>V6v]f>=Nn+8;q wfΰ/RLyUG#b}n!Dր_up|_ǰc/%ۥ nN8:d;-UJd/m1~VނיnN I˾$tF1&}|?q?\đXԑ&\4V<lKۮ3%Am_(q-(cAeGX)f}-˥6cv~Kg8m~v;|9:-iAPқ6ېn-.)<[$KJtt/L4ᖣ^Cmu4vb{+BG$M0c\[MR|0FԸP&78"4p#}DZ9;V9#>Sw"[UP7100010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0 `He M0 *H 1 *H 0 *H 1 160705143051Z0O *H 1B@ ur֑[$mrAlF\ B}r_|B|,h/@ϑ1. M0l *H 1_0]0 `He*0 `He0 *H 0*H 0 *H @0+0 *H (0 +710010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0*H 1010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems LLC CA1"0 *H Cuda Systems LLC CA)0 *H j@&B*[ha7R>%ez"/+(R?ׁK[Pa3,*IDX~8ʣM[GqL^{r=-eZv uX藪+%Mżc8{:0lЗY-"-S̒áE}7~{¦|q6P'KztqZVy02s?_C@ FvY| w덐٭U8ɋ$[ʈMO\7߽L^A)T~TTjz lD]~&3~_݄wǿ#<aWൾ$oN:<'*ش:Ji}&njȊbE8*ЄQ'@~ Z7'PMk/v>JE5wѡ.U74v QrU? ^e@ I[DZ
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d00e30d1-7c99-8f30-835e-705fbf3d00e8>
