From owner-freebsd-stable@freebsd.org Mon Jul 13 17:48:32 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 909C399B506 for ; Mon, 13 Jul 2015 17:48:32 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "NewFS.denninger.net", Issuer "NewFS.denninger.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 2967FC3B for ; Mon, 13 Jul 2015 17:48:31 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (localhost [127.0.0.1]) by fs.denninger.net (8.14.9/8.14.8) with ESMTP id t6DHmThY047875 for ; Mon, 13 Jul 2015 12:48:29 -0500 (CDT) (envelope-from karl@denninger.net) Received: from [192.168.1.40] [192.168.1.40] (Via SSLv3 AES128-SHA) ; by Spamblock-sys (LOCAL/AUTH) Mon Jul 13 12:48:29 2015 Message-ID: <55A3F9E1.9090901@denninger.net> Date: Mon, 13 Jul 2015 12:48:17 -0500 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: FreeBSD 10.1 Memory Exhaustion References: In-Reply-To: Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha1; boundary="------------ms080600050105070109010706" X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jul 2015 17:48:32 -0000 This is a cryptographically signed message in MIME format. --------------ms080600050105070109010706 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 7/13/2015 12:29, Adrian Chadd wrote: > hi, > > With that much storage and that many snapshots, I do think you need > more than 96GB of RAM in the box. I'm hoping someone doing active ZFS > work can comment.. > > I don't think the ZFS code is completely "memory usage" safe. The > "old" Sun suggestions when I started using ZFS was "if your server > panics due to out of memory with ZFS, buy more memory." > > That said, there doesn't look like there's a leak anywhere - those > dumps show you're using at least 32gig on each just in zfs data > buffers. That's normal. > > Try tuning the ARC down a little? The ARC is supposed to auto-size and use all available free memory. The problem is that the VM system and ARC system both make assumptions that under certain load patterns fight with one another, and when this happens and ARC wins the system gets in trouble FAST. The pattern is that the system will start to page RSS out rather than evict ARC, ARC will fill the freed space, it pages more RSS out..... you see where this winds up heading yes? UMA contributes to the problem substantially when ZFS grabs chunks of RAM of a given size and then frees them as UMA will hold that RAM in reserve in the event of a subsequent allocation. For certain work patterns this can get really ugly as you can wind up with huge amounts of RAM held by the UMA system and unavailable, yet not in actual use. The patch I posted the link to has addressed both of these issues on my systems here (and a number of other people's as well); I continue to run it here in production and have been extremely happy with it. > -adrian > > > On 13 July 2015 at 04:48, Christopher Forgeron w= rote: >> >> TL;DR Summary: I can run FreeBSD out of memory quite consistently, and= it=E2=80=99s >> not a TOS/mbuf exhaustion issue. It=E2=80=99s quite possible that ZFS = is the >> culprit, but shouldn=E2=80=99t the pager be able to handle aggressive = memory >> requests in a low memory situation gracefully, without needing custom = tuning >> of ZFS / VM? >> >> >> Hello, >> >> I=E2=80=99ve been dealing with some instability in my 10.1-RELEASE and= >> STABLEr282701M machines for the last few months. >> >> These machines are NFS/iSCSI storage machines, running on Dell M610x o= r >> similar hardware, 96 Gig Memory, 10Gig Network Cards, dual Xeon Proces= sors =E2=80=93 >> Fairly beefy stuff. >> >> Initially I thought it was more issues with TOS / jumbo mbufs, as I ha= d this >> problem last year. I had thought that this was properly resolved, but >> setting my MTU to 1500, and turning off TOS did give me a bit more >> stability. Currently all my machines are set this way. >> >> Crashes were usually represented by loss of network connectivity, and = the >> ctld daemon scrolling messages across the screen at full speed about l= ost >> connections. >> >> All of this did seem like more network stack problems, but with each c= rash >> I=E2=80=99d be able to learn a bit more. >> >> Usually there was nothing of any use in the logfile, but every now and= then >> I=E2=80=99d get this: >> >> Jun 3 13:02:04 san0 kernel: WARNING: 172.16.0.97 >> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory >> Jun 3 13:02:04 san0 kernel: WARNING: icl_pdu_new: failed to allocate = 80 >> bytes >> Jun 3 13:02:04 san0 kernel: WARNING: 172.16.0.97 >> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory >> Jun 3 13:02:04 san0 kernel: WARNING: icl_pdu_new: failed to allocate = 80 >> bytes >> Jun 3 13:02:04 san0 kernel: WARNING: 172.16.0.97 >> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory >> --------- >> Jun 4 03:03:09 san0 kernel: WARNING: icl_pdu_new: failed to allocate = 80 >> bytes >> Jun 4 03:03:09 san0 kernel: WARNING: icl_pdu_new: failed to allocate = 80 >> bytes >> Jun 4 03:03:09 san0 kernel: WARNING: 172.16.0.97 >> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory >> Jun 4 03:03:09 san0 kernel: WARNING: 172.16.0.97 >> (iqn.1998-01.com.vmware:esx5a-3387a188): connection error; dropping >> connection >> Jun 4 03:03:09 san0 kernel: WARNING: 172.16.0.97 >> (iqn.1998-01.com.vmware:esx5a-3387a188): connection error; dropping >> connection >> Jun 4 03:03:10 san0 kernel: WARNING: 172.16.0.97 >> (iqn.1998-01.com.vmware:esx5a-3387a188): waiting for CTL to terminate = tasks, >> 1 remaining >> Jun 4 06:04:27 san0 syslogd: kernel boot file is /boot/kernel/kernel >> >> So knowing that it seemed to be running out of memory, I started leavi= ng >> leaving =E2=80=98vmstat 5=E2=80=99 running on a console, to see what i= t was displaying >> during the crash. >> >> It was always the same thing: >> >> 0 0 0 1520M 4408M 15 0 0 0 25 19 0 0 21962 1667 9= 1390 >> 0 33 67 >> 0 0 0 1520M 4310M 9 0 0 0 2 15 3 0 21527 1385 9= 5165 >> 0 31 69 >> 0 0 0 1520M 4254M 7 0 0 0 14 19 0 0 17664 1739 7= 2873 >> 0 18 82 >> 0 0 0 1520M 4145M 2 0 0 0 0 19 0 0 23557 1447 9= 6941 >> 0 36 64 >> 0 0 0 1520M 4013M 4 0 0 0 14 19 0 0 4288 490 34= 685 0 >> 72 28 >> 0 0 0 1520M 3885M 2 0 0 0 0 19 0 0 11141 1038 6= 9242 >> 0 52 48 >> 0 0 0 1520M 3803M 10 0 0 0 14 19 0 0 24102 1834 9= 1050 >> 0 33 67 >> 0 0 0 1520M 8192B 2 0 0 0 2 15 1 0 19037 1131 7= 7470 >> 0 45 55 >> 0 0 0 1520M 8192B 0 22 0 0 2 0 6 0 146 82 5= 78 0 >> 0 100 >> 0 0 0 1520M 8192B 1 0 0 0 0 0 0 0 130 40 5= 10 0 >> 0 100 >> 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 143 40 5= 01 0 >> 0 100 >> 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 201 62 6= 60 0 >> 0 100 >> 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 101 28 4= 04 0 >> 0 100 >> 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 97 27 3= 98 0 >> 0 100 >> 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 93 28 3= 77 0 >> 0 100 >> 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 92 27 3= 73 0 >> 0 100 >> >> >> I=E2=80=99d go from a decent amount of free memory to suddenly having= none. Vmstat >> would stop outputting, console commands would hang, etc. The whole sys= tem >> would be useless. >> >> Looking into this, I came across a similar issue; >> >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D199189 >> >> I started increasing v.v_free_min, and it helped =E2=80=93 My crashes = went from >> being ~every 6 hours to every few days. >> >> Currently I=E2=80=99m running with vm.v_free_min=3D1254507 =E2=80=93 T= hat=E2=80=99s (1254507 * 4KiB) , >> or 4.78GiB of Reserve. The vmstat above is of a machine with that set= ting >> still running to 8B of memory. >> >> I have two issues here: >> >> 1) I don=E2=80=99t think I should ever be able to run the system into = the ground on >> memory. Deny me new memory until the pager can free more. >> 2) Setting =E2=80=98min=E2=80=99 doesn=E2=80=99t really mean =E2=80=98= min=E2=80=99 as it can obviously go below that >> threshold. >> >> >> I have plenty of local UFS swap (non-ZFS drives) >> >> Adrian requested that I output a few more diagnostic items, and this = is >> what I=E2=80=99m running on a console now, in a loop: >> >> vmstat >> netstat -m >> vmstat -z >> sleep 1 >> >> The output of four crashes are attached here, as they can be a bit lon= g. Let >> me know if that=E2=80=99s not a good way to report them. They will eac= h start >> mid-way through a vmstat =E2=80=93z output, as that=E2=80=99s as far b= ack as my terminal >> buffer allows. >> >> >> >> Now, I have a good idea of the conditions that are causing this: ZFS >> Snapshots, run by cron, during times of high ZFS writes. >> >> The crashes are all nearly on the hour, as that=E2=80=99s when crontab= triggers my >> python scripts to make new snapshots, and delete old ones. >> >> My average FreeBSD machine has ~ 30 zfs datasets, with each pool havin= g ~20 >> TiB used. These all need to snapshot on the hour. >> >> By staggering the snapshots by a few minutes, I have been able to redu= ce >> crashing from every other day to perhaps once a week if I=E2=80=99m lu= cky =E2=80=93 But if I >> start moving a lot of data around, I can cause daily crashes again. >> >> It=E2=80=99s looking to be the memory demand of snapshotting lots of Z= FS datasets at >> the same time while accepting a lot of write traffic. >> >> Now perhaps the answer is =E2=80=98don=E2=80=99t do that=E2=80=99 but = I feel that FreeBSD should be >> robust enough to handle this. I don=E2=80=99t mind tuning for now to >> reduce/eliminate this, but others shouldn=E2=80=99t run into this pain= just because >> they heavily load their machines =E2=80=93 There must be a way of avoi= ding this >> condition. >> >> Here are the contents of my /boot/loader.conf and sysctl.conf, so show= my >> minimal tuning to make this problem a little more bearable: >> >> /boot/loader.conf >> vfs.zfs.arc_meta_limit=3D49656727553 >> vfs.zfs.arc_max =3D 91489280512 >> >> /etc/sysctl.conf >> vm.v_free_min=3D1254507 >> >> >> Any suggestions/help is appreciated. >> >> Thank you. >> > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.or= g"=EF=BF=BD#=EF=BF=BD0=12=EF=BF=BD=08=EF=BF=BD=EF=BF=BDa/=EF=BF=BDj=EF=BF= =BD!y=EF=BF=BD=EF=BF=BD~=EF=BF=BD=EF=BF=BDn=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF= =BF=BD>=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=C6=A0 > --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms080600050105070109010706 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGXzCC BlswggRDoAMCAQICASkwDQYJKoZIhvcNAQELBQAwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQI EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEg U3lzdGVtcyBMTEMgQ0EwHhcNMTUwNDIxMDIyMTU5WhcNMjAwNDE5MDIyMTU5WjBaMQswCQYD VQQGEwJVUzEQMA4GA1UECBMHRmxvcmlkYTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEe MBwGA1UEAxMVS2FybCBEZW5uaW5nZXIgKE9DU1ApMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A MIICCgKCAgEAuYRY+EB2mGtZ3grlVO8TmnEvduVFA/IYXcCmNSOC1q+pTVjylsjcHKBcOPb9 TP1KLxdWP+Q1soSORGHlKw2/HcVzShDW5WPIKrvML+Ry0XvIvNBu9adTiCsA9nci4Cnf98XE hVpenER0qbJkBUOGT1rP4iAcfjet0lEgzPEnm+pAxv6fYSNp1WqIY9u0b1pkQiaWrt8hgNOc rJOiLbc8CeQ/DBP6rUiQjYNO9/aPNauEtHkNNfR9RgLSfGUdZuOCmJqnIla1HsrZhA5p69Bv /e832BKiNPaH5wF6btAiPpTr2sRhwQO8/IIxcRX1Vxd1yZbjYtJGw+9lwEcWRYAmoxkzKLPi S6Zo/6z5wgNpeK1H+zOioMoZIczgI8BlX1iHxqy/FAvm4PHPnC8s+BLnJLwr+jvMNHm82QwL J9hC5Ho8AnFU6TkCuq+P2V8/clJVqnBuvTUKhYMGSm4mUp+lAgR4L+lwIEqSeWVsxirIcE7Z OKkvI7k5x3WeE3+c6w74L6PfWVAd84xFlo9DKRdU9YbkFuFZPu21fi/LmE5brImB5P+jdqnK eWnVwRq+RBFLy4kehCzMXooitAwgP8l/JJa9VDiSyd/PAHaVGiat2vCdDh4b8cFL7SV6jPA4 k0MgGUA/6Et7wDmhZmCigggr9K6VQCx8jpKB3x1NlNNiaWECAwEAAaOB9DCB8TA3BggrBgEF BQcBAQQrMCkwJwYIKwYBBQUHMAGGG2h0dHA6Ly9jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNV HRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIFoDALBgNVHQ8EBAMCBeAwLAYJYIZIAYb4QgENBB8W HU9wZW5TU0wgR2VuZXJhdGVkIENlcnRpZmljYXRlMB0GA1UdDgQWBBTFHJQt6cloXBdG1Pv1 o2YgH+7lWTAfBgNVHSMEGDAWgBQkcZudhX383d29sMqSlAOh+tNtNTAdBgNVHREEFjAUgRJr YXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcNAQELBQADggIBAE9/dxi2YqjCYYhiybp4GKcm 7tBVa/GLW+qcHPcoT4dqmqghlLz8+iUH+HCJjRQATVGyMEnvISOKFVHC6aZIG+Sg7J8bfS4+ fjKDi9smRH2VPPx3bV8+yFYRNroMGHaPHZB/Xctmmvc+PZ9O2W7rExgrODtxIOB3Zs6wkYf+ ty+9r1KmTHlV+rRHI6timH1uiyFE3cPi1taAEBxf0851cJV8k40PGF8G48ewnq8SY9sCf5cv liXbpdgU+I4ND5BuTjg63WS32zuhLd1VSuH3ZC/QbcncMX5W3oLXmcQP5/5uTiBJy74kdPtG MSZ9rXwZPwNxP/8PXMSR7ViaFvjUkf4bJlyENFa2PGxLk4EUzOuO7t3brjMlQW1fuInfG+ko 3tVxko20Hp0tKGPe/9cOxBVBZeZH/VgpZn3cLculGzZjmdh2fqAQ6kv9Z9AVOG1+dq0c1zt8 2zm+Oi1pikGXkfz5UJq60psY6zbX25BuEZkthO/qiS4pxjxb7gQkS0rTEHTy+qv0l3QVL0wa NAT74Zaj7l5DEW3qdQQ0dtVieyvptg9CxkfQJE3JyBMb0zBj9Qhc5/hbTfhSlHzZMEbUuIyx h9vxqFAmGzfB1/WfOKkiNHChkpPW8ZeH9yPeDBKvrgZ96dREHFoVkDk7Vpw5lSM+tFOfdyLg xxhb/RZVUDeUMYIE4zCCBN8CAQEwgZYwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQIEwdGbG9y aWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBMTEMxHDAa BgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEgU3lzdGVt cyBMTEMgQ0ECASkwCQYFKw4DAhoFAKCCAiEwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAc BgkqhkiG9w0BCQUxDxcNMTUwNzEzMTc0ODE3WjAjBgkqhkiG9w0BCQQxFgQUto1CAisYyj+5 vbA/0OQxLAhKEZIwbAYJKoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAEC MAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzAN BggqhkiG9w0DAgIBKDCBpwYJKwYBBAGCNxAEMYGZMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBAgEpMIGpBgsqhkiG9w0BCRACCzGBmaCBljCBkDELMAkGA1UE BhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQ Q3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqG SIb3DQEJARYTQ3VkYSBTeXN0ZW1zIExMQyBDQQIBKTANBgkqhkiG9w0BAQEFAASCAgAe65ZE ItIllCdor72IPSgU+nhiPRvI6JIjYlw6TDEMtm55fVE4Mz0w1L6r/dxnq+3E6wBr//Sr+m1n WuRYfa9cUhucMdH3/DHwswehtEmCXjplxtFptWcC4dtvtXWqPPjESF50ENAk3zoONhmLRtIW 9aHxw7GR/HCwhLJ2MrigFi4KvLIKE8LKrBQ2F3S8jGN6lzyAuhCa4GCu5XNkQZpjEI3/wYec HqwHAgXGHSm0FSsIGRNm6+gcSRm8wfXcIYCZYBdMT028UxORKWQ8Fdb/e2YCl7Xmi0gCTfkQ W92fw8nepOf889wp9whwCOArLslbuxSfKJ8g1mSFuwZ7lsS5zAOLT89W+H5GWK5X6PzloCx6 qXst+hWOL1aab3L3ddPt+t8WdG9GQNV1rxAo/YwwH6TFnTbG5ncrXeAk7wLSlCIdsAvvAT4j 9OU5m0Sx+8Su/PP156PVFXS6372nXXO8Hrvsb1xJBbelbs1SgQS8L+TdPMqKu1CGfA7PrNrR DZpafPCsFprP5XWwB2uEHKcheOzCCdhhP2ayo0+W2EFt1fQw5EmidPEzrsnI+OGhgGyKNfqD 6C484EbtMQGGFawMG/Z4ybAY2AXq9ny6TpV5MZDyv/il9Z2EFDjlr7h5fi2LAxiL8aPYqk6D fSlHtm83QP7wEfNSIaK3MrTMcvm8CAAAAAAAAA== --------------ms080600050105070109010706--