Date: Tue, 14 Jul 2015 11:59:05 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-fs@freebsd.org Subject: Re: FreeBSD 10.1 Memory Exhaustion Message-ID: <55A53FD9.90400@denninger.net> In-Reply-To: <CACfj5vJvAz9StvjTrA1TzfS%2BMhi_qSrOc_qBNHr8qXbiAj81xw@mail.gmail.com> References: <CAB2_NwCngPqFH4q-YZk00RO_aVF9JraeSsVX3xS0z5EV3YGa1Q@mail.gmail.com> <55A3A800.5060904@denninger.net> <55A4D5B7.2030603@freebsd.org> <55A4E5AB.8060909@netlabs.org> <CACfj5vJvAz9StvjTrA1TzfS%2BMhi_qSrOc_qBNHr8qXbiAj81xw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format. --------------ms030302000908000202050400 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 7/14/2015 10:10, Sean Chittenden wrote: > I think the reason this is not seen more often is because people freque= ntly > throw limits on the arc in /boot/loader.conf: > > vfs.zfs.arc_min=3D"18G" > vfs.zfs.arc_max=3D"149G" > > ZFS ARC *should* not require those settings, but does currently for mix= ed > workloads (i.e. databases) in order to be "stable". By setting fixed s= izes > on the ARC, UMA and ARC are much more cooperative in that they have the= ir > own memory regions to manage so this behavior is not seen as often. However, this is a false God unless you have very tight control over the RSS requirements on your machine. For a NFS fileserver or similar you might be able to get away with that, because the number of nfsds (for example) is a nominally-known quantity and you can probably quantify RSS requirements. For a server that accepts connections from outside sources and is subject to burst loads this strategy moves the wall but probably doesn't prevent the problem entirely (what happens when a bunch of web clients hit your machine at once, for example, spiking memory demand?) The fundamental issue is that the base code will under certain load patterns (and surprisingly often) prefer to keep pages allocated to ARC (disk cache) in memory over RSS, causing RSS to be paged. This is exacerbated by UMA's "lazy" return allocated kernel memory (which is a good thing most of the time for performance reasons.) That decision is almost always wrong because paging RSS requires one guaranteed I/O (to place the paged RSS on the swap) and may require two (to later recover it if it is referenced); discarding cached disk data carries no I/O guarantee, with one future I/O only being required if the cached page is again referenced. The patch in question does not change the base code behavior until and unless memory is constrained. It then pares back ARC instead of allowing the system to page RSS, and in addition when under memory pressure UMA is patrolled to keep the lazily-held kernel memory in check along with cutting back dmu_tx write buffer size so as to prevent heavily burst-loading memory during write-intensive operations. The latter, IMHO, is a poorly chosen value in the first instance. The ideal situation would be one where the dmu_tx write buffer size is selected based on the performance of each vdev so as to always have at least one full buffer available when the previous DMA'd transfer completes, but not materially more than one (e.g. perhaps two maximum.) As it stands there is only one set for the entire system (rather than one per vdev) and it's sized based not on I/O channel performance but system memory with a cap of 4Gb, which takes a hell of a long time to drain to small-parallel (or non-parallel) spinning media vdevs. Such a flush can implicate a sequencing lock (e.g. you wish to modify something that is pending write in that buffer) which has the potential to lead to further misbehavior in the form of long delays before the system responds. If you have all spinning media and few parallel channels on the vdevs (or none) for writes lowering the max_max will be of material benefit in leveling out I/O performance with no penalty on peak write rates. However, if your machine is mixed SSD and spinning rust there's no "one size fits all", nor is there if you have pools with varying degrees of parallelism. > > To be clear, however, it should not be necessary to set parameters like= > these in /boot/loader.conf in order to obtain consistent operational > behavior. I'd be curious to know if someone running 10.2 BETA without > patches is able to trigger this behavior or not. There was work done t= hat > reported helped with this between 10.1 and now. To what extent it help= ed, > however, I don't have any advice yet. > > -sc I looked at the changes in 10.2-PRE and didn't see anything that led me to believe would materially change behavior. I have not, however, had the time to run an exhaustive test suite against unpatched 10.2-PRE. --=20 Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms030302000908000202050400 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGXzCC BlswggRDoAMCAQICASkwDQYJKoZIhvcNAQELBQAwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQI EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEg U3lzdGVtcyBMTEMgQ0EwHhcNMTUwNDIxMDIyMTU5WhcNMjAwNDE5MDIyMTU5WjBaMQswCQYD VQQGEwJVUzEQMA4GA1UECBMHRmxvcmlkYTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEe MBwGA1UEAxMVS2FybCBEZW5uaW5nZXIgKE9DU1ApMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A MIICCgKCAgEAuYRY+EB2mGtZ3grlVO8TmnEvduVFA/IYXcCmNSOC1q+pTVjylsjcHKBcOPb9 TP1KLxdWP+Q1soSORGHlKw2/HcVzShDW5WPIKrvML+Ry0XvIvNBu9adTiCsA9nci4Cnf98XE hVpenER0qbJkBUOGT1rP4iAcfjet0lEgzPEnm+pAxv6fYSNp1WqIY9u0b1pkQiaWrt8hgNOc rJOiLbc8CeQ/DBP6rUiQjYNO9/aPNauEtHkNNfR9RgLSfGUdZuOCmJqnIla1HsrZhA5p69Bv /e832BKiNPaH5wF6btAiPpTr2sRhwQO8/IIxcRX1Vxd1yZbjYtJGw+9lwEcWRYAmoxkzKLPi S6Zo/6z5wgNpeK1H+zOioMoZIczgI8BlX1iHxqy/FAvm4PHPnC8s+BLnJLwr+jvMNHm82QwL J9hC5Ho8AnFU6TkCuq+P2V8/clJVqnBuvTUKhYMGSm4mUp+lAgR4L+lwIEqSeWVsxirIcE7Z OKkvI7k5x3WeE3+c6w74L6PfWVAd84xFlo9DKRdU9YbkFuFZPu21fi/LmE5brImB5P+jdqnK eWnVwRq+RBFLy4kehCzMXooitAwgP8l/JJa9VDiSyd/PAHaVGiat2vCdDh4b8cFL7SV6jPA4 k0MgGUA/6Et7wDmhZmCigggr9K6VQCx8jpKB3x1NlNNiaWECAwEAAaOB9DCB8TA3BggrBgEF BQcBAQQrMCkwJwYIKwYBBQUHMAGGG2h0dHA6Ly9jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNV HRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIFoDALBgNVHQ8EBAMCBeAwLAYJYIZIAYb4QgENBB8W HU9wZW5TU0wgR2VuZXJhdGVkIENlcnRpZmljYXRlMB0GA1UdDgQWBBTFHJQt6cloXBdG1Pv1 o2YgH+7lWTAfBgNVHSMEGDAWgBQkcZudhX383d29sMqSlAOh+tNtNTAdBgNVHREEFjAUgRJr YXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcNAQELBQADggIBAE9/dxi2YqjCYYhiybp4GKcm 7tBVa/GLW+qcHPcoT4dqmqghlLz8+iUH+HCJjRQATVGyMEnvISOKFVHC6aZIG+Sg7J8bfS4+ fjKDi9smRH2VPPx3bV8+yFYRNroMGHaPHZB/Xctmmvc+PZ9O2W7rExgrODtxIOB3Zs6wkYf+ ty+9r1KmTHlV+rRHI6timH1uiyFE3cPi1taAEBxf0851cJV8k40PGF8G48ewnq8SY9sCf5cv liXbpdgU+I4ND5BuTjg63WS32zuhLd1VSuH3ZC/QbcncMX5W3oLXmcQP5/5uTiBJy74kdPtG MSZ9rXwZPwNxP/8PXMSR7ViaFvjUkf4bJlyENFa2PGxLk4EUzOuO7t3brjMlQW1fuInfG+ko 3tVxko20Hp0tKGPe/9cOxBVBZeZH/VgpZn3cLculGzZjmdh2fqAQ6kv9Z9AVOG1+dq0c1zt8 2zm+Oi1pikGXkfz5UJq60psY6zbX25BuEZkthO/qiS4pxjxb7gQkS0rTEHTy+qv0l3QVL0wa NAT74Zaj7l5DEW3qdQQ0dtVieyvptg9CxkfQJE3JyBMb0zBj9Qhc5/hbTfhSlHzZMEbUuIyx h9vxqFAmGzfB1/WfOKkiNHChkpPW8ZeH9yPeDBKvrgZ96dREHFoVkDk7Vpw5lSM+tFOfdyLg xxhb/RZVUDeUMYIE4zCCBN8CAQEwgZYwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQIEwdGbG9y aWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBMTEMxHDAa BgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEgU3lzdGVt cyBMTEMgQ0ECASkwCQYFKw4DAhoFAKCCAiEwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAc BgkqhkiG9w0BCQUxDxcNMTUwNzE0MTY1OTA1WjAjBgkqhkiG9w0BCQQxFgQUpBLxuXKzV6Xf jjmA1nBPkGX01SQwbAYJKoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAEC MAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzAN BggqhkiG9w0DAgIBKDCBpwYJKwYBBAGCNxAEMYGZMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBAgEpMIGpBgsqhkiG9w0BCRACCzGBmaCBljCBkDELMAkGA1UE BhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQ Q3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqG SIb3DQEJARYTQ3VkYSBTeXN0ZW1zIExMQyBDQQIBKTANBgkqhkiG9w0BAQEFAASCAgCtr/mp 2xtCp9lqANDC3n4kvVD7qrikc7O0mC9ynESVFyU7kgbYW0fiZctA3R0kNUlpWcLrPKBi+7Ob hS8G1ryf/fJYYn5PhLQ6NIekk/EI7P1wG0T+HqjdhtF1feY3IZ0Ox10DuDXoxOegp+y0yR4x pGEmDRDChew6fReVazpUeViQBKZahmfwjETIhQp33MgN+ZPXKfCh3A0HlGENB9qBymEcy8x/ zp/JcstHkuBLgV1IAUGW0bt4QopUkw3fqjGNK1IJ1L9NQyAmbYpGsp5TaRbay2WYkxF7nB8z 7AmjRU3zsvYEfORrjpsR69DY0wK3+IIKwdSD1HxDRzF1fEWexJxxX4L1gmeuRIm6/LKTid7x I5xFsNWdhmTXZ3Xmzoci9jRBebFSPX0MuD7t5KvV5LLlA79hyaph821TUCKylBty9jeH3VeB qW4y6I+IwQt++Cs43ORPFclQ/7hi0fU/SSWg9khXzTmxMr4/1+T5ksD7/XBLu7Y8N44XEiVY 1cvk4oeUQpSGIZ4HvBqP/3Dl4eOydlVvoFTwPFhEiR9r5dZkzZ+ujdPtZY/cdykOl+3MJhgU wwl68QvLMlVGBvFLfGMHbSmXlYAZgclbXcNXOOzyKHadlJF9WYpKYIXFocSIHM8/5Oh1w8Fn Zga0kX7AlMJMWWivxrv6YZzG5PQIBAAAAAAAAA== --------------ms030302000908000202050400--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55A53FD9.90400>