From owner-freebsd-fs@FreeBSD.ORG Wed Mar 19 13:07:02 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B7D9CCB8 for ; Wed, 19 Mar 2014 13:07:02 +0000 (UTC) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 81DE02EF for ; Wed, 19 Mar 2014 13:07:01 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.8/8.14.8) with ESMTP id s2JD6t7g090679 for ; Wed, 19 Mar 2014 08:06:55 -0500 (CDT) (envelope-from karl@denninger.net) Received: from [127.0.0.1] (TLS/SSL) [192.168.1.40] by Spamblock-sys (LOCAL/AUTH); Wed Mar 19 08:06:55 2014 Message-ID: <5329966A.60308@denninger.net> Date: Wed, 19 Mar 2014 08:06:50 -0500 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix References: <201403181520.s2IFK1M3069036@freefall.freebsd.org> <53288024.2060005@denninger.net> <53288629.60309@FreeBSD.org> <532992B8.4090407@netlabs.org> In-Reply-To: <532992B8.4090407@netlabs.org> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha1; boundary="------------ms020502080605050300090009" X-Antivirus: avast! (VPS 140319-0, 03/19/2014), Outbound message X-Antivirus-Status: Clean X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2014 13:07:02 -0000 This is a cryptographically signed message in MIME format. --------------ms020502080605050300090009 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable On 3/19/2014 7:51 AM, Adrian Gschwend wrote: > On 18.03.14 18:45, Andriy Gapon wrote: > >>> This is consistent with what I and others have observed on both 9.2 >>> and 10.0; the ARC will expand until it hits the maximum configured >>> even at the expense of forcing pages onto the swap. In this >>> specific machine's case left to defaults it will grab nearly all >>> physical memory (over 20GB of 24) and wire it down. >> Well, this does not match my experience from before 10.x times. > I reported the issue on which Karl gave feedback and developed the > patch. The original thread of my report started here: > > http://lists.freebsd.org/pipermail/freebsd-fs/2014-March/019043.html > > Note that I don't have big memory eaters like VMs, it's just a bunch of= > jails and services running in them. Including some JVMs. > > Check out the munin graphs before and after: > > Daily which does not seem to grow much anymore now: > http://ktk.netlabs.org/misc/munin-mem-zfs1.png > > Weekly: > http://ktk.netlabs.org/misc/munin-mem-zfs2.png > > You can actually see where I activated the patch (16.3), the system > behaves *much* better since then. I did one more reboot that's why it > goes down again but since then I did not reboot anymore. > > The moments where munin did not report anything the system was in the > ARC-swap lock and virtually dead. From working on the system it feels > like a new machine, everything is super fast and snappy. > > I don't understand much of the discussions you guys are having but I'm > pretty sure Karl fixed an issue which gave me headache on BSD over > years. I first saw this in 8.x when I started to use ZFS productively > and I've seen it in all 9.x release as well up to this patch. > > regards > > Adrian > I have a newer version of this patch responding to the criticisms given=20 on gnats; it is being tested now. The salient difference is that it now does two things that are a bit=20 different: 1. It grabs the VM "first level" warning (vm_v_free_target), deducts 20% = from that, and sets that as the low-RAM warning level. 2. It also allows the setting of a freemem reservation in percentage as=20 an "additional" reservation (plus the low RAM warning level.) Both are exposed via sysctl and thus can be tuned during runtime. The reason for the change is that there is a legitimate criticism that=20 the pager may allow inact pages to grow without boundary if you never=20 get into the VM system's first warning level on free pages; that is, it=20 is never called upon to perform page stealing. "Never" seems like a bad = decision (shouldn't you clean things up eventually anyway?) but it is=20 what it is and the VM system has proved over time to be stable and fast, = and for mixed workloads I can see where there could be trouble there in=20 that ARC cache could be convinced to evict unnecessarily. Unbounded=20 inact page growth doesn't happen on my systems here but since it might=20 and appears to be reasonably easy to defend against without causing=20 other bad side effects that appears to be worth eliminating as a=20 potential problem. So instead I try to get more intelligent about choosing the arc eviction = level; I want it into the zone where the system will steal pages back,=20 but I *do not*, under any circumstance, want to allow vm.v_free_min to=20 be invaded, because that's where processes asking for memory get=20 **SUSPENDED** (that is, where stalls start to happen.) Since the knobs are exposed you can get the behavior you have now if you = want it, or you can leave it alone and let the code choose what it thinks are intelligent values. If you=20 diddle the knobs and don't like them you can reset the percentage=20 reservation to zero along with freepages and the system will pick up the = defaults again for you in real time and without rebooting. Also, and very importantly, I can now trivially provoke an INTENTIONAL=20 stall with the knobs exposed; set the reservation down far enough (which = effectively reverts to the system only paring cache when paging_needed=20 is set as is the case with the default arc.c "as-shipped") and then=20 simply copy a huge file to /dev/null (big enough to fill up the cache)=20 and bang -- INSTANT 15 second stall. Turn it back up so the ARC cache=20 is not allowed to drive the system into hard paging and the problem=20 disappears. I'm going to let it run through the day today before sending it up; it=20 ran overnight without problems and looks good, but I want to go through=20 a heavy load period before publishing it. I note that there are list complaints about this behavior going back to=20 at least 2010..... --=20 -- Karl karl@denninger.net --------------ms020502080605050300090009 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDAzMTkxMzA2NTBaMCMGCSqGSIb3DQEJBDEW BBRnHYvVHEl24aSo7+d8I3eGzLHcHzBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAeoibScORZu/Gp9+l1lwZVoArJKE9 nQ8OrdlrEAZeUEEmEYQBhmOh5tEFG8XfaiCKoy96o4lLgNtdDkENEkqBjP/+PENjTs9IG3xd JHcm+QHORkVKG8JYU0kJKfussU+Iu/C3szSu7R27SdA18xVoF9haUCIUdw868Otn0HWwhlLg 8xlBewU5yONBKc5s9FKKrLQdvVtI8Qyy2toKYqdtgUZ0JJrMDFkF4oQQYThCWBbhVWesDpaC zui9Zeftqh0fR3WGSbchWppt8pCChgP/WN7r+ew1co8zc6IpYtMoMiWl1ONP1RbVjCG7Asjn HF/XqEGJ/dVZ2Rto/H4bNVbs33cJQ/H+5XiOwUIAmKE82e4N23Lyn7g9Hi7EwOWdY3Tgplu7 MwYewPEOO6MEVW42N8PQwSkrmW1eRRp7YZhYQIGCboJts5xLQ5MvObkEk8M3YS7kjjL3zl4N HVfBMfaMQaiYKx4UDAsMRjFhQ3CQlNnowz46EaPZhoKbgS14N6NaBokXQeOMnua0bX1DCB1s xKToNSYizFlNAXPR/3WX2gu4JBKqpfjg+bGru3fIGVQV5xeH2FUNJFnQE310aAHUWs3+I73h /sJI3IQ83Uv5QnxvmjOHvgJC4T2cVhqVD4IT4t6AFsMjq45mTtQYWgln2hNBHt75K+Oa6UY8 JgFT8IsAAAAAAAA= --------------ms020502080605050300090009--