From owner-freebsd-hackers@freebsd.org Tue Jul 5 17:50:37 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36FF7B73515 for ; Tue, 5 Jul 2016 17:50:37 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D2CD61EEF for ; Tue, 5 Jul 2016 17:50:36 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 013CB220426 for ; Tue, 5 Jul 2016 12:50:33 -0500 (CDT) Subject: Re: ZFS ARC and mmap/page cache coherency question To: freebsd-hackers@freebsd.org References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org> From: Karl Denninger Message-ID: <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net> Date: Tue, 5 Jul 2016 12:50:16 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms040109070705040203000606" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 17:50:37 -0000 This is a cryptographically signed message in MIME format. --------------ms040109070705040203000606 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 7/5/2016 12:19, Matthew Macy wrote: > > > ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger wrote ----=20 > > =20 > > =20 > > On 7/4/2016 18:45, Matthew Macy wrote:=20 > > >=20 > > >=20 > > > ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger wrote ---- =20 > > > > =20 > > > > On 7/3/2016 02:45, Matthew Macy wrote: =20 > > > > > =20 > > > > > Cedric greatly overstates the intractability of r= esolving it. Nonetheless, since the initial import very little has been d= one to improve integration, and I don't know of anyone who is up to the t= ask taking an interest in it. Consequently, mmap() performance is likely = "doomed" for the foreseeable future.-M---- =20 > > > > =20 > > > > Wellllll.... =20 > > > > =20 > > > > I've done a fair bit of work here (see =20 > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and= the =20 > > > > political issues are at least as bad as the coding ones. =20 > > > > =20 > > > =20 > > >=20 > > > Strictly speaking, the root of the problem is the ARC. Not ZFS per= se. Have you ever tried disabling MFU caching to see how much worse LRU = only is? I'm not really convinced the ARC's benefits justify its cost.=20 > > >=20 > > > -M=20 > > >=20 > > =20 > > The ARC is very useful when it gets a hit as it avoid an I/O that wo= uld=20 > > otherwise take place.=20 > > =20 > > Where it sucks is when the system evicts working set to preserve ARC= =2E =20 > > That's always wrong in that you're trading a speculative I/O (if the= =20 > > cache is hit later) for a *guaranteed* one (to page out) and maybe *= two*=20 > > (to page back in.)=20 > =20 > The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. T= here are a lot of issues stemming from the fact that ZFS is a transaction= al object store with a POSIX FS on top. One is that it caches disk blocks= as opposed to file blocks. However, if one could resolve that and have t= he page cache manage these blocks life would be much much better. However= , you'd lose MFU. Hence my question. > > -M > I suspect there's an argument to be made there but the present problems make determining the impact of that difficult or impossible as those effects are swamped by the other issues. I can fairly-easily create workloads on the base code where simply typing "vi ", making a change and hitting ":w" will result in a stall of tens of seconds or more while the cache flush that gets requested is run down. I've resolved a good part (but not all instances) of this through my work. My understanding is that 11- has had additional work done to the base code, but three underlying issues are not, from what I can see in the commit logs and discussions, addressed: The VM system will page out working set while leaving ARC alone, UMA reserved-but-not-in-use space is not policed adequately when memory pressure exists *before* the pager starts considering evicting working set and the write-back cache is for many machine configurations grossly inappropriate and cannot be tuned adequately by hand (particularly being true on a system with vdevs that have materially-varying performance levels.) I have more-or-less stopped work on the tree on a forward basis since I got to a place with 10.2 that (1) works for my production requirements, resolving the problems and (2) ran into what I deemed to be intractable political issues within core on progress toward eradicating the root of the problem. I will probably revisit the situation with 11- at some point, as I'll want to roll my production systems forward. However, I don't know when that will be -- right now 11- is stable enough for some of my embedded work (e.g. on the Raspberry Pi2) but is not on my server and client-class machines. Indeed just yesterday I got a lock-order reversal panic while doing a shutdown after a kernel update on one of my lab boxes running a just-updated 11- codebase. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms040109070705040203000606 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUxNzUwMTZaME8GCSqGSIb3DQEJBDFCBEB+ uW3KWU2eWDSXQTUP44BqHki8DdlspeuMs4iJnNFKXBwEb87FP/Qe3cSJk7JA9zPF4h13zPI8 Df2xbeNhsq9JMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAQmw70oJD QhBLWxXdxGwD1Dws9tblRJ67e7dRElxtME/yJs1Gxtl4o4hwC76qd4mMmJ5wrCMcaZ9qDZwX TKpC5/fWGU/sqXv4utH6fF18lbimDjm/SywA06DXwklNWHs+Y9k9HU06FXHn+n71wKHjR6t4 lRqF5yt6Uf7MK9quuL3l06HXgwoQZf75IR3WNSCvbrujAgLQDhjaaHLv12HiQPwbKsL5dAS2 PeF4wenKdi46Buil3qZ2EW7jrkoFoe2toUjak9skpZwFUD8X6ddPJf/kaofxq8bO7CJ4+bVx ypOlRVNxVOEbRN5NNdHyel0hhFyNGVDiuOkrzOzhk1YBxRy0nYAeP/0DkhkZLcEEPyqLX9Kb HH9Iy3kHEgJvw1vmvA+Jlpxrp1WcE7/pMQYndb2EfLXXNKaoJ0SnLlhD5uva/M00IxU+Rmr2 TolbZP5/pLsUYgiFkujv0jh/ChTOoEvIJFQNn3OELCI+MJPmJG6x9NVNBb4CmaiuP2L5IKNY /59qJVeS1CwVZAPAHUGRMc900VFi3HS1mLvyZC7NBCI1Fzp5V7Qrw6lh3gNNGr9PolxhaCS0 rRTLk1QrEyhmxCof/WQQHBWJqdhoTRu5TU8hSZoPmRCDbfGIWjphhTfCtXVDetDYJojtXnFn Aq/qFus05SnoKigpGQhxSEo3dCoAAAAAAAA= --------------ms040109070705040203000606--