Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Jul 2015 11:59:05 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-fs@freebsd.org
Subject:   Re: FreeBSD 10.1 Memory Exhaustion
Message-ID:  <55A53FD9.90400@denninger.net>
In-Reply-To: <CACfj5vJvAz9StvjTrA1TzfS%2BMhi_qSrOc_qBNHr8qXbiAj81xw@mail.gmail.com>
References:  <CAB2_NwCngPqFH4q-YZk00RO_aVF9JraeSsVX3xS0z5EV3YGa1Q@mail.gmail.com> <55A3A800.5060904@denninger.net> <55A4D5B7.2030603@freebsd.org> <55A4E5AB.8060909@netlabs.org> <CACfj5vJvAz9StvjTrA1TzfS%2BMhi_qSrOc_qBNHr8qXbiAj81xw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms030302000908000202050400
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 7/14/2015 10:10, Sean Chittenden wrote:
> I think the reason this is not seen more often is because people freque=
ntly
> throw limits on the arc in /boot/loader.conf:
>
> vfs.zfs.arc_min=3D"18G"
> vfs.zfs.arc_max=3D"149G"
>
> ZFS ARC *should* not require those settings, but does currently for mix=
ed
> workloads (i.e. databases) in order to be "stable".  By setting fixed s=
izes
> on the ARC, UMA and ARC are much more cooperative in that they have the=
ir
> own memory regions to manage so this behavior is not seen as often.
However, this is a false God unless you have very tight control over the
RSS requirements on your machine.  For a NFS fileserver or similar you
might be able to get away with that, because the number of nfsds (for
example) is a nominally-known quantity and you can probably quantify RSS
requirements.

For a server that accepts connections from outside sources and is
subject to burst loads this strategy moves the wall but probably doesn't
prevent the problem entirely (what happens when a bunch of web clients
hit your machine at once, for example, spiking memory demand?)

The fundamental issue is that the base code will under certain load
patterns (and surprisingly often) prefer to keep pages allocated to ARC
(disk cache) in memory over RSS, causing RSS to be paged.  This is
exacerbated by UMA's "lazy" return allocated kernel memory (which is a
good thing most of the time for performance reasons.)  That decision is
almost always wrong because paging RSS requires one guaranteed I/O (to
place the paged RSS on the swap) and may require two (to later recover
it if it is referenced); discarding cached disk data carries no I/O
guarantee, with one future I/O only being required if the cached page is
again referenced.

The patch in question does not change the base code behavior until and
unless memory is constrained.  It then pares back ARC instead of
allowing the system to page RSS, and in addition when under memory
pressure UMA is patrolled to keep the lazily-held kernel memory in check
along with cutting back dmu_tx write buffer size so as to prevent
heavily burst-loading memory during write-intensive operations.

The latter, IMHO, is a poorly chosen value in the first instance.  The
ideal situation would be one where the dmu_tx write buffer size is
selected based on the performance of each vdev so as to always have at
least one full buffer available when the previous DMA'd transfer
completes, but not materially more than one (e.g. perhaps two
maximum.)   As it stands there is only one set for the entire system
(rather than one per vdev) and it's sized based not on I/O channel
performance but system memory with a cap of 4Gb, which takes a hell of a
long time to drain to small-parallel (or non-parallel) spinning media
vdevs.  Such a flush can implicate a sequencing lock (e.g. you wish to
modify something that is pending write in that buffer) which has the
potential to lead to further misbehavior in the form of long delays
before the system responds. If you have all spinning media and few
parallel channels on the vdevs (or none) for writes lowering the max_max
will be of material benefit in leveling out I/O performance with no
penalty on peak write rates.  However, if your machine is mixed SSD and
spinning rust there's no "one size fits all", nor is there if you have
pools with varying degrees of parallelism.
>
> To be clear, however, it should not be necessary to set parameters like=

> these in /boot/loader.conf in order to obtain consistent operational
> behavior.  I'd be curious to know if someone running 10.2 BETA without
> patches is able to trigger this behavior or not.  There was work done t=
hat
> reported helped with this between 10.1 and now.  To what extent it help=
ed,
> however, I don't have any advice yet.
>
> -sc
I looked at the changes in 10.2-PRE and didn't see anything that led me
to believe would materially change behavior.  I have not, however, had
the time to run an exhaustive test suite against unpatched 10.2-PRE.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms030302000908000202050400
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGXzCC
BlswggRDoAMCAQICASkwDQYJKoZIhvcNAQELBQAwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQI
EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEg
U3lzdGVtcyBMTEMgQ0EwHhcNMTUwNDIxMDIyMTU5WhcNMjAwNDE5MDIyMTU5WjBaMQswCQYD
VQQGEwJVUzEQMA4GA1UECBMHRmxvcmlkYTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEe
MBwGA1UEAxMVS2FybCBEZW5uaW5nZXIgKE9DU1ApMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A
MIICCgKCAgEAuYRY+EB2mGtZ3grlVO8TmnEvduVFA/IYXcCmNSOC1q+pTVjylsjcHKBcOPb9
TP1KLxdWP+Q1soSORGHlKw2/HcVzShDW5WPIKrvML+Ry0XvIvNBu9adTiCsA9nci4Cnf98XE
hVpenER0qbJkBUOGT1rP4iAcfjet0lEgzPEnm+pAxv6fYSNp1WqIY9u0b1pkQiaWrt8hgNOc
rJOiLbc8CeQ/DBP6rUiQjYNO9/aPNauEtHkNNfR9RgLSfGUdZuOCmJqnIla1HsrZhA5p69Bv
/e832BKiNPaH5wF6btAiPpTr2sRhwQO8/IIxcRX1Vxd1yZbjYtJGw+9lwEcWRYAmoxkzKLPi
S6Zo/6z5wgNpeK1H+zOioMoZIczgI8BlX1iHxqy/FAvm4PHPnC8s+BLnJLwr+jvMNHm82QwL
J9hC5Ho8AnFU6TkCuq+P2V8/clJVqnBuvTUKhYMGSm4mUp+lAgR4L+lwIEqSeWVsxirIcE7Z
OKkvI7k5x3WeE3+c6w74L6PfWVAd84xFlo9DKRdU9YbkFuFZPu21fi/LmE5brImB5P+jdqnK
eWnVwRq+RBFLy4kehCzMXooitAwgP8l/JJa9VDiSyd/PAHaVGiat2vCdDh4b8cFL7SV6jPA4
k0MgGUA/6Et7wDmhZmCigggr9K6VQCx8jpKB3x1NlNNiaWECAwEAAaOB9DCB8TA3BggrBgEF
BQcBAQQrMCkwJwYIKwYBBQUHMAGGG2h0dHA6Ly9jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNV
HRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIFoDALBgNVHQ8EBAMCBeAwLAYJYIZIAYb4QgENBB8W
HU9wZW5TU0wgR2VuZXJhdGVkIENlcnRpZmljYXRlMB0GA1UdDgQWBBTFHJQt6cloXBdG1Pv1
o2YgH+7lWTAfBgNVHSMEGDAWgBQkcZudhX383d29sMqSlAOh+tNtNTAdBgNVHREEFjAUgRJr
YXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcNAQELBQADggIBAE9/dxi2YqjCYYhiybp4GKcm
7tBVa/GLW+qcHPcoT4dqmqghlLz8+iUH+HCJjRQATVGyMEnvISOKFVHC6aZIG+Sg7J8bfS4+
fjKDi9smRH2VPPx3bV8+yFYRNroMGHaPHZB/Xctmmvc+PZ9O2W7rExgrODtxIOB3Zs6wkYf+
ty+9r1KmTHlV+rRHI6timH1uiyFE3cPi1taAEBxf0851cJV8k40PGF8G48ewnq8SY9sCf5cv
liXbpdgU+I4ND5BuTjg63WS32zuhLd1VSuH3ZC/QbcncMX5W3oLXmcQP5/5uTiBJy74kdPtG
MSZ9rXwZPwNxP/8PXMSR7ViaFvjUkf4bJlyENFa2PGxLk4EUzOuO7t3brjMlQW1fuInfG+ko
3tVxko20Hp0tKGPe/9cOxBVBZeZH/VgpZn3cLculGzZjmdh2fqAQ6kv9Z9AVOG1+dq0c1zt8
2zm+Oi1pikGXkfz5UJq60psY6zbX25BuEZkthO/qiS4pxjxb7gQkS0rTEHTy+qv0l3QVL0wa
NAT74Zaj7l5DEW3qdQQ0dtVieyvptg9CxkfQJE3JyBMb0zBj9Qhc5/hbTfhSlHzZMEbUuIyx
h9vxqFAmGzfB1/WfOKkiNHChkpPW8ZeH9yPeDBKvrgZ96dREHFoVkDk7Vpw5lSM+tFOfdyLg
xxhb/RZVUDeUMYIE4zCCBN8CAQEwgZYwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQIEwdGbG9y
aWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBMTEMxHDAa
BgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEgU3lzdGVt
cyBMTEMgQ0ECASkwCQYFKw4DAhoFAKCCAiEwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAc
BgkqhkiG9w0BCQUxDxcNMTUwNzE0MTY1OTA1WjAjBgkqhkiG9w0BCQQxFgQUpBLxuXKzV6Xf
jjmA1nBPkGX01SQwbAYJKoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAEC
MAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzAN
BggqhkiG9w0DAgIBKDCBpwYJKwYBBAGCNxAEMYGZMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBAgEpMIGpBgsqhkiG9w0BCRACCzGBmaCBljCBkDELMAkGA1UE
BhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQ
Q3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqG
SIb3DQEJARYTQ3VkYSBTeXN0ZW1zIExMQyBDQQIBKTANBgkqhkiG9w0BAQEFAASCAgCtr/mp
2xtCp9lqANDC3n4kvVD7qrikc7O0mC9ynESVFyU7kgbYW0fiZctA3R0kNUlpWcLrPKBi+7Ob
hS8G1ryf/fJYYn5PhLQ6NIekk/EI7P1wG0T+HqjdhtF1feY3IZ0Ox10DuDXoxOegp+y0yR4x
pGEmDRDChew6fReVazpUeViQBKZahmfwjETIhQp33MgN+ZPXKfCh3A0HlGENB9qBymEcy8x/
zp/JcstHkuBLgV1IAUGW0bt4QopUkw3fqjGNK1IJ1L9NQyAmbYpGsp5TaRbay2WYkxF7nB8z
7AmjRU3zsvYEfORrjpsR69DY0wK3+IIKwdSD1HxDRzF1fEWexJxxX4L1gmeuRIm6/LKTid7x
I5xFsNWdhmTXZ3Xmzoci9jRBebFSPX0MuD7t5KvV5LLlA79hyaph821TUCKylBty9jeH3VeB
qW4y6I+IwQt++Cs43ORPFclQ/7hi0fU/SSWg9khXzTmxMr4/1+T5ksD7/XBLu7Y8N44XEiVY
1cvk4oeUQpSGIZ4HvBqP/3Dl4eOydlVvoFTwPFhEiR9r5dZkzZ+ujdPtZY/cdykOl+3MJhgU
wwl68QvLMlVGBvFLfGMHbSmXlYAZgclbXcNXOOzyKHadlJF9WYpKYIXFocSIHM8/5Oh1w8Fn
Zga0kX7AlMJMWWivxrv6YZzG5PQIBAAAAAAAAA==
--------------ms030302000908000202050400--





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55A53FD9.90400>