From owner-freebsd-fs@freebsd.org Fri Aug 19 20:39:09 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 81CE5BBEF51 for ; Fri, 19 Aug 2016 20:39:09 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (denninger.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3D69A1318 for ; Fri, 19 Aug 2016 20:39:09 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 3C107208713; Fri, 19 Aug 2016 15:39:01 -0500 (CDT) Subject: Re: ZFS ARC under memory pressure To: Slawa Olhovchenkov , freebsd-fs@freebsd.org References: <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> <20160819201840.GA12519@zxy.spb.ru> From: Karl Denninger Message-ID: Date: Fri, 19 Aug 2016 15:38:55 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160819201840.GA12519@zxy.spb.ru> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms030104010004020408090706" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2016 20:39:09 -0000 This is a cryptographically signed message in MIME format. --------------ms030104010004020408090706 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 8/19/2016 15:18, Slawa Olhovchenkov wrote: > On Thu, Aug 18, 2016 at 03:31:26PM -0500, Karl Denninger wrote: > >> On 8/18/2016 15:26, Slawa Olhovchenkov wrote: >>> On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote: >>> >>>> On 16/08/2016 22:34, Slawa Olhovchenkov wrote: >>>>> I see issuses with ZFS ARC inder memory pressure. >>>>> ZFS ARC size can be dramaticaly reduced, up to arc_min. >>>>> >>>>> As I see memory pressure event cause call arc_lowmem and set needfr= ee: >>>>> >>>>> arc.c:arc_lowmem >>>>> >>>>> needfree =3D btoc(arc_c >> arc_shrink_shift); >>>>> >>>>> After this, arc_available_memory return negative vaules (PAGESIZE *= >>>>> (-needfree)) until needfree is zero. Independent how too much memor= y >>>>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=3D= >>>>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at eve= ry >>>>> loop interation). >>>>> >>>>> arc_c droped to minimum value if arc_size fast enough droped. >>>>> >>>>> No control current to initial memory allocation. >>>>> >>>>> As result, I can see needless arc reclaim, from 10x to 100x times. >>>>> >>>>> Can some one check me and comment this? >>>> You might have found a real problem here, but I am short of time rig= ht now to >>>> properly analyze the issue. I think that on illumos 'needfree' is a= variable >>>> that's managed by the virtual memory system and it is akin to our >>>> vm_pageout_deficit. But during the porting it became an artificial = value and >>>> its handling might be sub-optimal. >>> As I see, totaly not optimal. >>> I am create some patch for sub-optimal handling and now test it. >>> _______________________________________________ >>> freebsd-fs at freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.o= rg" >> You might want to look at the code contained in here: >> >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594 > In may case arc.c issuse cused by revision r286625 in HEAD (and > r288562 in STABLE) -- all in 2015, not touch in 2014. > >> There are some ugly interactions with the VM system you can run into i= f >> you're not careful; I've chased this issue before and while I haven't >> yet done the work to integrate it into 11.x (and the underlying code >> *has* changed since the 10.x patches I developed) if you wind up drivi= ng >> the VM system to evict pages to swap rather than pare back ARC you're >> probably making the wrong choice. >> >> In addition UMA can come into the picture too and (at least previously= ) >> was a severe contributor to pathological behavior. > I am only do less aggresive (and more controlled) shrink of ARC size. > Now ARC just collapsed. > > Pointed PR is realy BIG. I am can't read and understund all of this. > r286625 change behaivor of interaction between ARC and VM. > You problem still exist? Can you explain (in list)? > Essentially ZFS is a "bolt-on" and unlike UFS which uses the unified buffer cache (which the VM system manages) ZFS does not. ARC is allocated out of kernel memory and (by default) also uses UMA; the VM system is not involved in its management. When the VM system gets constrained (low memory) it thus cannot tell the ARC to pare back. So when the VM system gets low on RAM it will start to page. The problem with this is that if the VM system is low on RAM because the ARC is consuming memory you do NOT want to page, you want to evict some of the ARC. Consider this: ARC data *at best* prevents one I/O. That is, if there is data in the cache when you go to read from disk, you avoid one I/O per unit of data in the ARC you didn't have to read. Paging *always* requires one I/O (to write the page(s) to the swap) and MAY involve two (to later page it back in.) It is never a "win" to spend a *guaranteed* I/O when you can instead act in a way that *might* cause you to (later) need to execute one. Unfortunately the VM system has another interaction that causes trouble too. The VM system will "demote" a page to inactive or cache status but not actually free it. It only starts to go through those pages and free them when the vm system wakes up, and that only happens when free space gets low enough to trigger it. Finally, there's another problem that comes into play; UMA. Kernel memory allocation is fairly expensive. UMA grabs memory from the kernel allocation system in big chunks and manages it, and by doing so gains a pretty-significant performance boost. But this means that you can have large amounts of RAM that are allocated, not in use, and yet the VM system cannot reclaim them on its own. The ZFS code has to reap those caches, but reaping them is a moderately expensive operation too, thus you don't want to do it unnecessarily. I've not yet gone through the 11.x code to see what changed from 10.x; what I do know is that it is materially better-behaved than it used to be, in that prior to 11.x I would have (by now) pretty much been forced into rolling that forward and testing it because the misbehavior in one of my production systems was severe enough to render it basically unusable without the patch in that PR inline, with the most-serious misbehavior being paging-induced stalls that could reach 10s of seconds or more in duration. 11.x hasn't exhibited the severe problems, unpatched, that 10.x was known to do on my production systems -- but it is far less than great in that it sure as heck does have UMA coherence issues..... ARC Size: 38.58% 8.61 GiB Target Size: (Adaptive) 70.33% 15.70 GiB Min Size (Hard Limit): 12.50% 2.79 GiB Max Size (High Water): 8:1 22.32 GiB I have 20GB out in kernel memory on this machine right now but only 8.6 of it in ARC; the rest is (mostly) sitting in UMA allocated-but-unused -- so despite the belief expressed by some that the 11.x code is "better" at reaping UMA I'm sure not seeing it here. I'll get around to rolling forward and modifying that PR since that particular bit of jackassery with UMA is a definite performance problem. I suspect a big part of what you're seeing lies there as well. When I do get that code done and tested I suspect it may solve your problems as well. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms030104010004020408090706 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MTkyMDM4NTVaME8GCSqGSIb3DQEJBDFCBEBf jToACzZ0aJsbs4kPgVo+9NmclF/Tf7IFMaWtJhWC9c6c69kh1e+sd08ixw/+69mWurKFLbsi k0c2kARW6yWTMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAiRCmguHm vmtO9kxmx0/QNeOjkaTT4SSzpqNuCXGfFc8z0DusMi0yV+d1qQ6+Kd7MGZKuyliV934suUt4 1f4Nr6F8A72QoHVYxFhO5FfkPqCogYMGgcViqaAh3AsHvr/lwHef1gGiUQS0rRVv61WYFacs g7vSULg4J5WuwDAhnHoW2FJ5EFrHXZdCb4h0i7aFnDktrCasXowAy5HPxa52oF1zSFKtOP3Z N1o3irux+R+ZvQPs5Bt31feVIevfaWHtdESwqgtalnGhpdgr4yFJldMIegJG3gNYc7+ChbH5 xLctVV6Du/PEZdcDUmq3jZjjAHAhqxcDmheRW+EXBbRbMTSHESx5hGIKYZ/7MrAfjC+MwN1W JHCeLw9x7VV8ucmUje7X3Pb7VSStmfkSt3qgAUEQQDc/1BpBBLPVXSF5i62UqNC/4iL00/Qe pQ9uNoBwVDIZik0Qmhb+Smeu6jEG88IBacqfNH+RoF9UPW0GgGh3k2un4jeaPHkljRQStydR 8RlOqflMEGG4LHX2v4YiYA+UHtxMV2pu4L+q26WsFEelD0byDenXX5aneKr0J62/Qo40NEc1 2lDDhPouaTPmYKNFjQX4pXKMTLo8hWSQEvcfwleZBBh0gmVoQNHDQP/wGGzSNkuq7gtEXs3l i0kEDF41DkONr7NO1N3+mBYkFNYAAAAAAAA= --------------ms030104010004020408090706--