Date: Sun, 17 Mar 2019 09:58:52 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-stable@freebsd.org Subject: Observations from a ZFS reorganization on 12-STABLE Message-ID: <58eb1994-41bd-cd22-be66-0024bcbc36e6@denninger.net>
index | next in thread | raw e-mail
[-- Attachment #1 --] I've long argued that the VM system's interaction with ZFS' arc cache and UMA has serious, even severe issues. 12.x appeared to have addressed some of them, and as such I've yet to roll forward any part of the patch series that is found here [ https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 ] or the Phabricator version referenced in the bug thread (which is more-complex and attempts to dig at the root of the issue more effectively, particularly when UMA is involved as it usually is.) Yesterday I decided to perform a fairly significant reorganization of the ZFS pools on one of my personal machines, including the root pool which was on mirrored SSDs, changing to a Raidz2 (also on SSDs.) This of course required booting single-user from a 12-Stable memstick. A simple "zfs send -R zs/root-save/R | zfs recv -Fuev zsr/R" should have done it, no sweat. The root that was copied over before I started is uncomplicated; it's compressed, but not de-duped. While it has snapshots on it too it's by no means complex. *The system failed to execute that command with an "out of swap space" error, killing the job; there was indeed no swap configured since I booted from a memstick.* Huh? A simple *filesystem copy* managed to force a 16Gb system into requiring page file backing store? I was able to complete the copy by temporarily adding the swap space back on (where it would be when the move was complete) but that requirement is pure insanity and it appears, from what I was able to determine, that it came about from the same root cause that's been plaguing VM/ZFS interaction since 2014 when I started work this issue -- specifically, when RAM gets low rather than evict ARC (or clean up UMA that is allocated but unused) the system will attempt to page out working set. In this case since it couldn't page out working set since there was nowhere to page it to the process involved got an OOM error and was terminated. *I continue to argue that this decision is ALWAYS wrong.* It's wrong because if you invalidate cache and reclaim it you *might* take a read from physical I/O to replace that data back into the cache in the future (since it's not in RAM) but in exchange for a *potential* I/O you perform a GUARANTEED physical I/O (to page out some amount of working set) and possibly TWO physical I/Os (to page said working set out and, later, page it back in.) It has always appeared to me to be flat-out nonsensical to trade a possible physical I/O (if there is a future cache miss) for a guaranteed physical I/O and a possible second one. It's even worse if the reason you make that decision is that UMA is allocated but unused; in that case you are paging when no physical I/O is required at all as the "memory pressure" is a phantom! While UMA is a very material performance win in the general case to allow allocated-but-unused UMA to force paging, from a performance perspective, appears to be flat-out insanity. I find it very difficult to come up with any reasonable scenario where releasing allocated-but-unused UMA rather than paging out working set is a net performance loser. In this case since the system was running in single user mode the process that got selected to be destroyed when that circumstance arose and there was no available swap was the copy process itself. The copy itself did not require anywhere near all of the available non-kernel RAM. I'm going to dig into this further but IMHO the base issue still exists, even though the impact of it for my workloads with everything "running normally" has materially decreased with 12.x. -- Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ [-- Attachment #2 --] 0 *H 010 `He 0 *H 00 H^Ōc!5 H0 *H 010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA0 170817164217Z 270815164217Z0{10 UUS10UFlorida10U Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0"0 *H 0 h-5B>[;olӴ0~͎O9}9Ye*$g!ukvʶLzN`jL>MD'7U 45CB+kY`bd~b*c3Ny-78ju]9HeuέsӬDؽmgwER?&UURj'}9nWD i`XcbGz \gG=u%\Oi13ߝ4 K44pYQr]Ie/r0+eEޝݖ0C15Mݚ@JSZ(zȏ NTa(25DD5.l<g[[ZarQQ%Buȴ~~`IohRbʳڟu2MS8EdFUClCMaѳ !}ș+2k/bųE,n当ꖛ\(8WV8 d]b yXw ܊:I39 00U]^§Q\ӎ0U#0T039N0b010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA @Ui0U0 0U0 *H :P U!>vJnio-#ן]WyujǑR̀Q nƇ!GѦFg\yLxgw=OPycehf[}ܷ['4ڝ\[p 6\o.B&JF"ZC{;*o*mcCcLY߾` t*S!(`]DHP5A~/NPp6=mhk밣'doA$86hm5ӚS@jެEgl )0JG`%k35PaC?σ ׳HEt}!P㏏%*BxbQwaKG$6h¦Mve;[o-Iی& I,Tcߎ#t wPA@l0P+KXBպT zGv;NcI3&JĬUPNa?/%W6G۟N000 k#Xd\=0 *H 0{10 UUS10UFlorida10U Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0 170817212120Z 220816212120Z0W10 UUS10UFlorida10U Cuda Systems LLC10Ukarl@denninger.net0"0 *H 0 T[I-ΆϏ dn;Å@שy.us~_ZG%<MYd\gvfnsa1'6Egyjs"C [{~_K Pn+<*pv#Q+H/7[-vqDV^U>f%GX)H.|l`M(Cr>е͇6#odc"YljҦln8@5SA0&ۖ"OGj?UDWZ5 dDB7k-)9Izs-JAv J6L$Ն1SmY.Lqw*SH;EF'DĦH]MOgQQ|Mٙג2Z9y@y]}6ٽeY9Y2xˆ$T=eCǺǵbn֛{j|@LLt1[Dk5:$= ` M 00<+00.0,+0 http://ocsp.cudasystems.net:88880 U0 0 `HB0U0U%0++03 `HB &$OpenSSL Generated Client Certificate0U%՞V=;bzQ0U#0]^§Q\ӎϡ010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA H^Ōc!5 H0U0karl@denninger.net0 *H ۠A0-j%--$%g2#ޡ1^>{K+uGEv1ş7Af&b&O;.;A5*U)ND2bF|\=]<sˋL!wrw٧>YMÄ3\mWR hSv!_zvl? 3_ xU%\^#O*Gk̍YI_&Fꊛ@&1n } ͬ:{hTP3B.;bU8:Z=^Gw8!k-@xE@i,+'Iᐚ:fhztX7/(hY` O.1}a`%RW^akǂpCAufgDix UTЩ/7}%=jnVZvcF<M= 2^GKH5魉 _O4ެByʈySkw=5@h.0z> W1000{10 UUS10UFlorida10U Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0 `He E0 *H 1 *H 0 *H 1 190317145852Z0O *H 1B@v캴vLŶ&Knkn?C&$8\HdkVʃI40l *H 1_0]0 `He*0 `He0 *H 0*H 0 *H @0+0 *H (0 +7100{10 UUS10UFlorida10U Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0*H 10{10 UUS10UFlorida10U Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0 *H 9@Pϋ.؞ʜ@5s:FLN7hRKG,-3@dZs!u4Nl~"?^$LeB5ä9g2& 5?Wu`+Z;6{]X_MZ4epV;Vd[DLddiO5q)1 D}}"^CW[Eդr7ؔPX]˭R;}4a__MKմ$:fۘ8\ͮH S1`<u2&L !tZѝ~з!{YM+\ykmr|hUEV2:<IF1%xYR >C\Z{=~5)hc<au#W7ǢQpRFV0"=,&82$R)0GqCj~~ ˚-T*>4a+C8TdLϨ4home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?58eb1994-41bd-cd22-be66-0024bcbc36e6>
