Date: Thu, 09 Apr 2015 09:20:40 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-fs@freebsd.org Subject: Re: FreeBSD/ZFS on [HEAD] chews up memory Message-ID: <55268AB8.8010202@denninger.net> In-Reply-To: <728627c71bbc88bc9a454eda3370e485@mailbox.ijs.si> References: <CAD2Ti2_4S_yPgJdKxfb=_eQq5RezSTAa_M0V-EHf=y60k30RBQ@mail.gmail.com> <alpine.GSO.2.01.1504090814560.4186@freddy.simplesystems.org> <728627c71bbc88bc9a454eda3370e485@mailbox.ijs.si>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format. --------------ms060906010304000502060807 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable On 4/9/2015 08:53, Mark Martinec wrote: > 2015-04-09 15:19, Bob Friesenhahn wrote: >> On Thu, 9 Apr 2015, grarpamp wrote: >>>> RAM amount might matter too. 12GB vs 32GB is a bit of a difference. >>> Allow me to bitch hypothetically... >>> We, and I, get that some FS need memory, just like kernel and >>> userspace need memory to function. But to be honest, things >>> should fail or slow gracefully. Why in the world, regardless of >>> directory size, should I ever need to feed ZFS 10GB of RAM? >> >> From my reading of this list in the past month or so, I have seen >> other complaints about memory usage, but also regarding UFS and NFS >> and not just ZFS. One is lead to think that the way the system uses >> memory for filesystems has changed. >> >> As others have said, ZFS ARC should automatically diminish, but >> perhaps ZFS ARC is not responsible for the observed memory issues. >> >> Bob > > I'd really like to see the: > > [Bug 187594] [zfs] [patch] ZFS ARC behavior problem and fix > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594 > > find its way into 10-STABLE. Things behaved much more > sanely some time in 9.*, before the great UMA change > took place. Not everyone has dozens of gigabytes of memory. > With 16 GB mem even when memory is tight (poudriere build), > the wired size seems excessive, most of which is ARC. > There are a number of intertwined issues related to how the VM system=20 interacts with ZFS' use of memory for ARC; the patch listed above IMHO=20 resolves most -- but not all -- of them. The one big one remaining, that I do not have a patch to fix at present, = is the dmu_tx write cache (exposed in sysctl as=20 vfs.zfs.dirty_data_max*) It is sized based on available RAM at boot=20 with both a minimum and maximum size and is across all pools. This=20 initializes to allow up to 10% of RAM to be used for this on boot with a = cap of 4Gb. That can be a problem because in a moderately-large RAM=20 configuration machine with spinning rust it is entirely possible for=20 that write cache to represent /*tens of seconds or even more than a=20 minute */of actual I/O time to flush. (The maximum full-track=20 sequential I/O speed of a 7200RPM 4TB drive is in the ~200Mb/sec range;=20 10% of 32Gb is 3Gb, so this is ~15 seconds of time in a typical 4-unit=20 RaidZ2 zVol -- and it gets worse, much worse, with smaller-capacity=20 disks that have less areal density under the head and thus are slower=20 due to the basic physics of the matter.) The write cache is a very=20 good thing for performance in most circumstances because it allows ZFS=20 to optimize writes to minimize the number of seeks and latency required=20 but there are some pathological cases where having it too large is very=20 bad for performance. Specifically, it becomes a problem when the operation you wish to=20 perform on the filesystem requires coherency with something _*in*_ that=20 cache, and thus the cache must flush and complete before that operation=20 can succeed. This manifests as you doing something as benign as typing=20 "vi some-file" and your terminal session locks up for tens of seconds=20 to, in some cases, more than a minute! If _*all*_ the disks on your machine are of a given type and reasonably=20 coherent in I/O throughput (e.g. all SSDs, all rotating rust of the same = approximate size and throughput, etc) then you can tune this as the code = stands to get good performance and avoid the problem. But if you have=20 some volumes comprised of high-performance SSD storage (say, for=20 often-modified or accessed database tables) and other volumes comprised=20 of high-capacity spinning rust (because SSD for storage of that data=20 makes no economic sense) then you've got a problem because=20 dirty_data_max is system-wide and not per-pool. The irony is that with the patch I developed in under heavy load the=20 pathology tends to not happen because the dmu_tx cache gets cut back=20 automatically under heavy load as part of the UMA reuse mitigation=20 strategy that I implemented in that patch. But under light load it=20 still can and sometimes does bite you. The best (and I argue proper)=20 means for eliminating that is for the dmu_tx cache to be sized per-pool=20 and to be computed based on the pool's actual write I/O performance; in=20 other words, it should be sized to represent a maximum=20 latency-to-coherence time that is acceptable (and that should be able to = be tuned.) Doing so appears to be quite non-trivial though or I would=20 have already taken it on and addressed it. --=20 Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ --------------ms060906010304000502060807 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGWTCC BlUwggQ9oAMCAQICARowDQYJKoZIhvcNAQELBQAwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQI EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEg U3lzdGVtcyBMTEMgQ0EwHhcNMTUwMzI1MTMxMDIwWhcNMjAwMzIzMTMxMDIwWjBTMQswCQYD VQQGEwJVUzEQMA4GA1UECBMHRmxvcmlkYTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEX MBUGA1UEAxMOS2FybCBEZW5uaW5nZXIwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoIC AQCmk+jIznE3HgHbh4JU2s86dKGDs4f3ZdED6vCQx9+LnJl7GgT2aAUAARqNnH5dDuC4w/4h K1qb8sXu3yYWlXLLs+vw3oLnx284o0kSZZs/FQ9W90gVTeZ1iTybscN7iXkaf83g1jueBNby n4v1bJEwX/xe94NW0IwBPluOzzXVIMskaZWhqGLtSaiSo4PYUnYMXPRNG7NAWQ2VAZXJkIM2 AM0B3LfyTyZw+NDNJMMQLZBDqS5vHuS78UODXpyyliSsBgaa04KVRsrcz6S2aYxk9ZjU3yD2 JJ7ezlKnZ4j/pc+16rv5fPfJWZAmG3v3kMiMzoDMS+d6CsSYxyQYHDGt+2If0cGpFv3D7Xr6 jxHouLKtipMQ/pPd+T+lugdEj3JfRu0nIM38j+dQh1N+wdiCEgFo0XuPIWW9g7VGwk8n29KR LAT10QZH9ADYbQqwXeXe9xWXjMAXHm6NTXyxpYyuNAV5zwsT5N4fZRwxKn828XZAKLLyeDGg 2lSBKdnT3osk668Yi5hclZH3UX8JikOWzixQ7T1/lWYGAGbElwFUC3xKRv/TI8E6ZYYYbQVN 3JXLKNIfQ7I9fpqrQeMVe03zXGGsXcE1krA8M4VP1ipoDfGD0/Pt8k2mTLUc4PY9eKZJfVlY WWJ/RHPp+N+MFD1sKirYrDvCHaeWyyLDx2dcIwIDAQABo4H1MIHyMAkGA1UdEwQCMAAwEQYJ YIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglghkgBhvhCAQ0EHxYdT3BlblNTTCBH ZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFCMvNiXRuCcqulq6cUWK8SNwo7vhMB8G A1UdIwQYMBaAFCRxm52Fffzd3b2wypKUA6H60201MB0GA1UdEQQWMBSBEmthcmxAZGVubmlu Z2VyLm5ldDA4BglghkgBhvhCAQMEKxYpaHR0cHM6Ly9jdWRhc3lzdGVtcy5uZXQ6MTE0NDMv cmV2b2tlZC5jcmwwDQYJKoZIhvcNAQELBQADggIBAHAUwvyHIJw3LTLqpF4apSzuIm5sBqyH rYg1mk7vPkgPFSrsr3AmmtR2iifN7fgAG6NzrL7SddhTiIMbW7mL32Tuklx9sUXM6iEyuiL/ /TRZ95ob7BtM58x2R6y2p00OKOfUCmjyqWy/pAjUAk7c5m9uLr6rVQUj0lGuvCMZEo1lnG6S +EUZ1Mi2mz9HrZBR2GhNPb5UgNVsX91So+uEF+1pRg1mQO6KvX4E84MOPe++qM76o+NvlEIw IU9tYHjSgjqWrqUQgEesMjahWEblfT+XPvPwy9WtICESMQGdGzVgDBgwoFrFnS2GyKlve0rj LKBs5ZtMrsASnbSvWX5uYy6Fb0Gv/F2neStmAyxL6Kupu4D28QpbtG1nxl3pN3SyQiUfXvVm LC3JvS6r6eQsG8Q/6fxCvUNQg8AjvVSMYTspAm3O0rihPQWX1GTAWS9fIxYlo5Y3NY8SpTXD 7RU5QzbQnK/mMJkuuysQeFAmK58El/7GcUy4zt2akuTBD0YroH2FHjfNUJej0lwHgxkNU3zG quKn+Llw1/u/+cncRiVPVatbqhUtXk2a0Y6OKrcAmFwzXlAi//hzofp3Sd1sWW0SUKQIizl7 xSxyu0cnYbxiBLDn3bmCTYCowUHLm6vBDc+3l6jxOM5fWOdJwq4hakYmrGonoI0pRTG732P9 Jf5/MYIE4zCCBN8CAQEwgZYwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIw EAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMT E0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMg Q0ECARowCQYFKw4DAhoFAKCCAiEwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG 9w0BCQUxDxcNMTUwNDA5MTQyMDQwWjAjBgkqhkiG9w0BCQQxFgQUesRApQLqLF7yBIuUifcy uPwXjJEwbAYJKoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqG SIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG 9w0DAgIBKDCBpwYJKwYBBAGCNxAEMYGZMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEaMIGpBgsqhkiG9w0BCRACCzGBmaCBljCBkDELMAkGA1UEBhMCVVMx EDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBT eXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJ ARYTQ3VkYSBTeXN0ZW1zIExMQyBDQQIBGjANBgkqhkiG9w0BAQEFAASCAgB+njxwhJwkSCuq pmsxFD4wNCxkVIMlDRxPAqqOSGcT0gL5xV3FWADoFGKgS8nItroXHQ8Xk6XsbabY3py/jS8D RRNoNxlF55hQy7VjjtrYo1PxHzlSNaawbrFxEWkkTHm8eagft4q0GZvhNm1v6CdceViJeCq3 XHh7YKiNwIaFQNHIT0+lITdk8o2n4hvz/OueMfQW8QF9rhC9G+ERKtL2322Of66By6/oMphn Tv6Kr+Ldy3wQ6VKT4Xk3DdC4tMFpq12Q9qe20BOr3vzbsbn2h/tPaGtLr0cS9laz6J0S3sVo V+G0orRY2hYirvw9q/mOl9Ri1VMO0LkdzLWtZ+3jrGnaz+8AlmEWY24Lyd4KEjVgvBi+PeHo 0iImWjTH5NDt8ALA6+xpjw3oYpYK5Xyry/PRMiWvZiq4AkhexFgVYL5hzGghV2izTKhv8AOa SImL6H74xuVt1rL5pGXlq4Bz2hPnPVuOf03zRhpJR+AEymD8wWrGRSRn3LsbKYZVHfHjnJwR 0sTVrm+nPJT7ca3Sy51APwj87+e/u22H0ktLK5BLUVFLarF5Fz2dGE7j/1OnEsx2Llp4OAv4 P+0ud3rIQoNQM6Y0mg0wZIqwmgbGOl+Jhg1sE58PmkV3KZZ0pCtg1H8vnb8BEhHTc3RF4vx5 XsvcKqY7aqGeiSHIR/cc5wAAAAAAAA== --------------ms060906010304000502060807--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55268AB8.8010202>