Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 09 Apr 2015 09:20:40 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-fs@freebsd.org
Subject:   Re: FreeBSD/ZFS on [HEAD] chews up memory
Message-ID:  <55268AB8.8010202@denninger.net>
In-Reply-To: <728627c71bbc88bc9a454eda3370e485@mailbox.ijs.si>
References:  <CAD2Ti2_4S_yPgJdKxfb=_eQq5RezSTAa_M0V-EHf=y60k30RBQ@mail.gmail.com> <alpine.GSO.2.01.1504090814560.4186@freddy.simplesystems.org> <728627c71bbc88bc9a454eda3370e485@mailbox.ijs.si>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms060906010304000502060807
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: quoted-printable

On 4/9/2015 08:53, Mark Martinec wrote:
> 2015-04-09 15:19, Bob Friesenhahn wrote:
>> On Thu, 9 Apr 2015, grarpamp wrote:
>>>> RAM amount might matter too. 12GB vs 32GB is a bit of a difference.
>>> Allow me to bitch hypothetically...
>>> We, and I, get that some FS need memory, just like kernel and
>>> userspace need memory to function. But to be honest, things
>>> should fail or slow gracefully. Why in the world, regardless of
>>> directory size, should I ever need to feed ZFS 10GB of RAM?
>>
>> From my reading of this list in the past month or so, I have seen
>> other complaints about memory usage, but also regarding UFS and NFS
>> and not just ZFS.  One is lead to think that the way the system uses
>> memory for filesystems has changed.
>>
>> As others have said, ZFS ARC should automatically diminish, but
>> perhaps ZFS ARC is not responsible for the observed memory issues.
>>
>> Bob
>
> I'd really like to see the:
>
>   [Bug 187594] [zfs] [patch] ZFS ARC behavior problem and fix
>     https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594
>
> find its way into 10-STABLE. Things behaved much more
> sanely some time in 9.*, before the great UMA change
> took place. Not everyone has dozens of gigabytes of memory.
> With 16 GB mem even when memory is tight (poudriere build),
> the wired size seems excessive, most of which is ARC.
>

There are a number of intertwined issues related to how the VM system=20
interacts with ZFS' use of memory for ARC; the patch listed above IMHO=20
resolves most -- but not all -- of them.

The one big one remaining, that I do not have a patch to fix at present, =

is the dmu_tx write cache (exposed in sysctl as=20
vfs.zfs.dirty_data_max*)  It is sized based on available RAM at boot=20
with both a minimum and maximum size and is across all pools.  This=20
initializes to allow up to 10% of RAM to be used for this on boot with a =

cap of 4Gb.  That can be a problem because in a moderately-large RAM=20
configuration machine with spinning rust it is entirely possible for=20
that write cache to represent /*tens of seconds or even more than a=20
minute */of actual I/O time to flush.  (The maximum full-track=20
sequential I/O speed of a 7200RPM 4TB drive is in the ~200Mb/sec range;=20
10% of 32Gb is 3Gb, so this is ~15 seconds of time in a typical 4-unit=20
RaidZ2 zVol -- and it gets worse, much worse, with smaller-capacity=20
disks that have less areal density under the head and thus are slower=20
due to the basic physics of the matter.)   The write cache is a very=20
good thing for performance in most circumstances because it allows ZFS=20
to optimize writes to minimize the number of seeks and latency required=20
but there are some pathological cases where having it too large is very=20
bad for performance.

Specifically, it becomes a problem when the operation you wish to=20
perform on the filesystem requires coherency with something _*in*_ that=20
cache, and thus the cache must flush and complete before that operation=20
can succeed.  This manifests as you doing something as benign as typing=20
"vi some-file" and your terminal session locks up for tens of seconds=20
to, in some cases, more than a minute!

If _*all*_ the disks on your machine are of a given type and reasonably=20
coherent in I/O throughput (e.g. all SSDs, all rotating rust of the same =

approximate size and throughput, etc) then you can tune this as the code =

stands to get good performance and avoid the problem.  But if you have=20
some volumes comprised of high-performance SSD storage (say, for=20
often-modified or accessed database tables) and other volumes comprised=20
of high-capacity spinning rust (because SSD for storage of that data=20
makes no economic sense) then you've got a problem because=20
dirty_data_max is system-wide and not per-pool.

The irony is that with the patch I developed in under heavy load the=20
pathology tends to not happen because the dmu_tx cache gets cut back=20
automatically under heavy load as part of the UMA reuse mitigation=20
strategy that I implemented in that patch.  But under light load it=20
still can and sometimes does bite you.  The best (and I argue proper)=20
means for eliminating that is for the dmu_tx cache to be sized per-pool=20
and to be computed based on the pool's actual write I/O performance; in=20
other words, it should be sized to represent a maximum=20
latency-to-coherence time that is acceptable (and that should be able to =

be tuned.)  Doing so appears to be quite non-trivial though or I would=20
have already taken it on and addressed it.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/

--------------ms060906010304000502060807
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGWTCC
BlUwggQ9oAMCAQICARowDQYJKoZIhvcNAQELBQAwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQI
EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEg
U3lzdGVtcyBMTEMgQ0EwHhcNMTUwMzI1MTMxMDIwWhcNMjAwMzIzMTMxMDIwWjBTMQswCQYD
VQQGEwJVUzEQMA4GA1UECBMHRmxvcmlkYTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEX
MBUGA1UEAxMOS2FybCBEZW5uaW5nZXIwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoIC
AQCmk+jIznE3HgHbh4JU2s86dKGDs4f3ZdED6vCQx9+LnJl7GgT2aAUAARqNnH5dDuC4w/4h
K1qb8sXu3yYWlXLLs+vw3oLnx284o0kSZZs/FQ9W90gVTeZ1iTybscN7iXkaf83g1jueBNby
n4v1bJEwX/xe94NW0IwBPluOzzXVIMskaZWhqGLtSaiSo4PYUnYMXPRNG7NAWQ2VAZXJkIM2
AM0B3LfyTyZw+NDNJMMQLZBDqS5vHuS78UODXpyyliSsBgaa04KVRsrcz6S2aYxk9ZjU3yD2
JJ7ezlKnZ4j/pc+16rv5fPfJWZAmG3v3kMiMzoDMS+d6CsSYxyQYHDGt+2If0cGpFv3D7Xr6
jxHouLKtipMQ/pPd+T+lugdEj3JfRu0nIM38j+dQh1N+wdiCEgFo0XuPIWW9g7VGwk8n29KR
LAT10QZH9ADYbQqwXeXe9xWXjMAXHm6NTXyxpYyuNAV5zwsT5N4fZRwxKn828XZAKLLyeDGg
2lSBKdnT3osk668Yi5hclZH3UX8JikOWzixQ7T1/lWYGAGbElwFUC3xKRv/TI8E6ZYYYbQVN
3JXLKNIfQ7I9fpqrQeMVe03zXGGsXcE1krA8M4VP1ipoDfGD0/Pt8k2mTLUc4PY9eKZJfVlY
WWJ/RHPp+N+MFD1sKirYrDvCHaeWyyLDx2dcIwIDAQABo4H1MIHyMAkGA1UdEwQCMAAwEQYJ
YIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglghkgBhvhCAQ0EHxYdT3BlblNTTCBH
ZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFCMvNiXRuCcqulq6cUWK8SNwo7vhMB8G
A1UdIwQYMBaAFCRxm52Fffzd3b2wypKUA6H60201MB0GA1UdEQQWMBSBEmthcmxAZGVubmlu
Z2VyLm5ldDA4BglghkgBhvhCAQMEKxYpaHR0cHM6Ly9jdWRhc3lzdGVtcy5uZXQ6MTE0NDMv
cmV2b2tlZC5jcmwwDQYJKoZIhvcNAQELBQADggIBAHAUwvyHIJw3LTLqpF4apSzuIm5sBqyH
rYg1mk7vPkgPFSrsr3AmmtR2iifN7fgAG6NzrL7SddhTiIMbW7mL32Tuklx9sUXM6iEyuiL/
/TRZ95ob7BtM58x2R6y2p00OKOfUCmjyqWy/pAjUAk7c5m9uLr6rVQUj0lGuvCMZEo1lnG6S
+EUZ1Mi2mz9HrZBR2GhNPb5UgNVsX91So+uEF+1pRg1mQO6KvX4E84MOPe++qM76o+NvlEIw
IU9tYHjSgjqWrqUQgEesMjahWEblfT+XPvPwy9WtICESMQGdGzVgDBgwoFrFnS2GyKlve0rj
LKBs5ZtMrsASnbSvWX5uYy6Fb0Gv/F2neStmAyxL6Kupu4D28QpbtG1nxl3pN3SyQiUfXvVm
LC3JvS6r6eQsG8Q/6fxCvUNQg8AjvVSMYTspAm3O0rihPQWX1GTAWS9fIxYlo5Y3NY8SpTXD
7RU5QzbQnK/mMJkuuysQeFAmK58El/7GcUy4zt2akuTBD0YroH2FHjfNUJej0lwHgxkNU3zG
quKn+Llw1/u/+cncRiVPVatbqhUtXk2a0Y6OKrcAmFwzXlAi//hzofp3Sd1sWW0SUKQIizl7
xSxyu0cnYbxiBLDn3bmCTYCowUHLm6vBDc+3l6jxOM5fWOdJwq4hakYmrGonoI0pRTG732P9
Jf5/MYIE4zCCBN8CAQEwgZYwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIw
EAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMT
E0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMg
Q0ECARowCQYFKw4DAhoFAKCCAiEwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG
9w0BCQUxDxcNMTUwNDA5MTQyMDQwWjAjBgkqhkiG9w0BCQQxFgQUesRApQLqLF7yBIuUifcy
uPwXjJEwbAYJKoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqG
SIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG
9w0DAgIBKDCBpwYJKwYBBAGCNxAEMYGZMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEaMIGpBgsqhkiG9w0BCRACCzGBmaCBljCBkDELMAkGA1UEBhMCVVMx
EDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBT
eXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJ
ARYTQ3VkYSBTeXN0ZW1zIExMQyBDQQIBGjANBgkqhkiG9w0BAQEFAASCAgB+njxwhJwkSCuq
pmsxFD4wNCxkVIMlDRxPAqqOSGcT0gL5xV3FWADoFGKgS8nItroXHQ8Xk6XsbabY3py/jS8D
RRNoNxlF55hQy7VjjtrYo1PxHzlSNaawbrFxEWkkTHm8eagft4q0GZvhNm1v6CdceViJeCq3
XHh7YKiNwIaFQNHIT0+lITdk8o2n4hvz/OueMfQW8QF9rhC9G+ERKtL2322Of66By6/oMphn
Tv6Kr+Ldy3wQ6VKT4Xk3DdC4tMFpq12Q9qe20BOr3vzbsbn2h/tPaGtLr0cS9laz6J0S3sVo
V+G0orRY2hYirvw9q/mOl9Ri1VMO0LkdzLWtZ+3jrGnaz+8AlmEWY24Lyd4KEjVgvBi+PeHo
0iImWjTH5NDt8ALA6+xpjw3oYpYK5Xyry/PRMiWvZiq4AkhexFgVYL5hzGghV2izTKhv8AOa
SImL6H74xuVt1rL5pGXlq4Bz2hPnPVuOf03zRhpJR+AEymD8wWrGRSRn3LsbKYZVHfHjnJwR
0sTVrm+nPJT7ca3Sy51APwj87+e/u22H0ktLK5BLUVFLarF5Fz2dGE7j/1OnEsx2Llp4OAv4
P+0ud3rIQoNQM6Y0mg0wZIqwmgbGOl+Jhg1sE58PmkV3KZZ0pCtg1H8vnb8BEhHTc3RF4vx5
XsvcKqY7aqGeiSHIR/cc5wAAAAAAAA==
--------------ms060906010304000502060807--





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55268AB8.8010202>