Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Mar 2014 08:06:50 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-fs@freebsd.org
Subject:   Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Message-ID:  <5329966A.60308@denninger.net>
In-Reply-To: <532992B8.4090407@netlabs.org>
References:  <201403181520.s2IFK1M3069036@freefall.freebsd.org> <53288024.2060005@denninger.net> <53288629.60309@FreeBSD.org> <532992B8.4090407@netlabs.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms020502080605050300090009
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable


On 3/19/2014 7:51 AM, Adrian Gschwend wrote:
> On 18.03.14 18:45, Andriy Gapon wrote:
>
>>> This is consistent with what I and others have observed on both 9.2
>>> and 10.0; the ARC will expand until it hits the maximum configured
>>> even at the expense of forcing pages onto the swap. In this
>>> specific machine's case left to defaults it will grab nearly all
>>> physical memory (over 20GB of 24) and wire it down.
>> Well, this does not match my experience from before 10.x times.
> I reported the issue on which Karl gave feedback and developed the
> patch. The original thread of my report started here:
>
> http://lists.freebsd.org/pipermail/freebsd-fs/2014-March/019043.html
>
> Note that I don't have big memory eaters like VMs, it's just a bunch of=

> jails and services running in them. Including some JVMs.
>
> Check out the munin graphs before and after:
>
> Daily which does not seem to grow much anymore now:
> http://ktk.netlabs.org/misc/munin-mem-zfs1.png
>
> Weekly:
> http://ktk.netlabs.org/misc/munin-mem-zfs2.png
>
> You can actually see where I activated the patch (16.3), the system
> behaves *much* better since then. I did one more reboot that's why it
> goes down again but since then I did not reboot anymore.
>
> The moments where munin did not report anything the system was in the
> ARC-swap lock and virtually dead. From working on the system it feels
> like a new machine, everything is super fast and snappy.
>
> I don't understand much of the discussions you guys are having but I'm
> pretty sure Karl fixed an issue which gave me headache on BSD over
> years. I first saw this in 8.x when I started to use ZFS productively
> and I've seen it in all 9.x release as well up to this patch.
>
> regards
>
> Adrian
>
I have a newer version of this patch responding to the criticisms given=20
on gnats; it is being tested now.

The salient difference is that it now does two things that are a bit=20
different:

1. It grabs the VM "first level" warning (vm_v_free_target), deducts 20% =

from that, and sets that as the low-RAM warning level.

2. It also allows the setting of a freemem reservation in percentage as=20
an "additional" reservation (plus the low RAM warning level.)

Both are exposed via sysctl and thus can be tuned during runtime.

The reason for the change is that there is a legitimate criticism that=20
the pager may allow inact pages to grow without boundary if you never=20
get into the VM system's first warning level on free pages; that is, it=20
is never called upon to perform page stealing.  "Never" seems like a bad =

decision (shouldn't you clean things up eventually anyway?) but it is=20
what it is and the VM system has proved over time to be stable and fast, =

and for mixed workloads I can see where there could be trouble there in=20
that ARC cache could be convinced to evict unnecessarily.  Unbounded=20
inact page growth doesn't happen on my systems here but since it might=20
and appears to be reasonably easy to defend against without causing=20
other bad side effects that appears to be worth eliminating as a=20
potential problem.

So instead I try to get more intelligent about choosing the arc eviction =

level; I want it into the zone where the system will steal pages back,=20
but I *do not*, under any circumstance, want to allow vm.v_free_min to=20
be invaded, because that's where processes asking for memory get=20
**SUSPENDED** (that is, where stalls start to happen.)

Since the knobs are exposed you can get the behavior you have now if you =

want it, or you can leave it alone and
let the code choose what it thinks are intelligent values.  If you=20
diddle the knobs and don't like them you can reset the percentage=20
reservation to zero along with freepages and the system will pick up the =

defaults again for you in real time and without rebooting.

Also, and very importantly, I can now trivially provoke an INTENTIONAL=20
stall with the knobs exposed; set the reservation down far enough (which =

effectively reverts to the system only paring cache when paging_needed=20
is set as is the case with the default arc.c "as-shipped") and then=20
simply copy a huge file to /dev/null (big enough to fill up the cache)=20
and bang -- INSTANT 15 second stall.  Turn it back up so the ARC cache=20
is not allowed to drive the system into hard paging and the problem=20
disappears.

I'm going to let it run through the day today before sending it up; it=20
ran overnight without problems and looks good, but I want to go through=20
a heavy load period before publishing it.

I note that there are list complaints about this behavior going back to=20
at least 2010.....

--=20
-- Karl
karl@denninger.net



--------------ms020502080605050300090009
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
+LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDAzMTkxMzA2NTBaMCMGCSqGSIb3DQEJBDEW
BBRnHYvVHEl24aSo7+d8I3eGzLHcHzBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAeoibScORZu/Gp9+l1lwZVoArJKE9
nQ8OrdlrEAZeUEEmEYQBhmOh5tEFG8XfaiCKoy96o4lLgNtdDkENEkqBjP/+PENjTs9IG3xd
JHcm+QHORkVKG8JYU0kJKfussU+Iu/C3szSu7R27SdA18xVoF9haUCIUdw868Otn0HWwhlLg
8xlBewU5yONBKc5s9FKKrLQdvVtI8Qyy2toKYqdtgUZ0JJrMDFkF4oQQYThCWBbhVWesDpaC
zui9Zeftqh0fR3WGSbchWppt8pCChgP/WN7r+ew1co8zc6IpYtMoMiWl1ONP1RbVjCG7Asjn
HF/XqEGJ/dVZ2Rto/H4bNVbs33cJQ/H+5XiOwUIAmKE82e4N23Lyn7g9Hi7EwOWdY3Tgplu7
MwYewPEOO6MEVW42N8PQwSkrmW1eRRp7YZhYQIGCboJts5xLQ5MvObkEk8M3YS7kjjL3zl4N
HVfBMfaMQaiYKx4UDAsMRjFhQ3CQlNnowz46EaPZhoKbgS14N6NaBokXQeOMnua0bX1DCB1s
xKToNSYizFlNAXPR/3WX2gu4JBKqpfjg+bGru3fIGVQV5xeH2FUNJFnQE310aAHUWs3+I73h
/sJI3IQ83Uv5QnxvmjOHvgJC4T2cVhqVD4IT4t6AFsMjq45mTtQYWgln2hNBHt75K+Oa6UY8
JgFT8IsAAAAAAAA=
--------------ms020502080605050300090009--





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5329966A.60308>