Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Aug 2016 15:38:55 -0500
From:      Karl Denninger <karl@denninger.net>
To:        Slawa Olhovchenkov <slw@zxy.spb.ru>, freebsd-fs@freebsd.org
Subject:   Re: ZFS ARC under memory pressure
Message-ID:  <bcb14d0b-bd6d-cb93-ea71-3656cfce8b3b@denninger.net>
In-Reply-To: <20160819201840.GA12519@zxy.spb.ru>
References:  <20160816193416.GM8192@zxy.spb.ru> <8dbf2a3a-da64-f7f8-5463-bfa23462446e@FreeBSD.org> <20160818202657.GS8192@zxy.spb.ru> <c3bc6c5a-961c-e3a4-2302-f0f7417bc34f@denninger.net> <20160819201840.GA12519@zxy.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms030104010004020408090706
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 8/19/2016 15:18, Slawa Olhovchenkov wrote:
> On Thu, Aug 18, 2016 at 03:31:26PM -0500, Karl Denninger wrote:
>
>> On 8/18/2016 15:26, Slawa Olhovchenkov wrote:
>>> On Thu, Aug 18, 2016 at 11:00:28PM +0300, Andriy Gapon wrote:
>>>
>>>> On 16/08/2016 22:34, Slawa Olhovchenkov wrote:
>>>>> I see issuses with ZFS ARC inder memory pressure.
>>>>> ZFS ARC size can be dramaticaly reduced, up to arc_min.
>>>>>
>>>>> As I see memory pressure event cause call arc_lowmem and set needfr=
ee:
>>>>>
>>>>> arc.c:arc_lowmem
>>>>>
>>>>>         needfree =3D btoc(arc_c >> arc_shrink_shift);
>>>>>
>>>>> After this, arc_available_memory return negative vaules (PAGESIZE *=

>>>>> (-needfree)) until needfree is zero. Independent how too much memor=
y
>>>>> freed. needfree set to 0 in arc_reclaim_thread(), when arc_size <=3D=

>>>>> arc_c. Until arc_size don't drop below arc_c (arc_c deceased at eve=
ry
>>>>> loop interation).
>>>>>
>>>>> arc_c droped to minimum value if arc_size fast enough droped.
>>>>>
>>>>> No control current to initial memory allocation.
>>>>>
>>>>> As result, I can see needless arc reclaim, from 10x to 100x times.
>>>>>
>>>>> Can some one check me and comment this?
>>>> You might have found a real problem here, but I am short of time rig=
ht now to
>>>> properly analyze the issue.  I think that on illumos 'needfree' is a=
 variable
>>>> that's managed by the virtual memory system and it is akin to our
>>>> vm_pageout_deficit.  But during the porting it became an artificial =
value and
>>>> its handling might be sub-optimal.
>>> As I see, totaly not optimal.
>>> I am create some patch for sub-optimal handling and now test it.
>>> _______________________________________________
>>> freebsd-fs at freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.o=
rg"
>> You might want to look at the code contained in here:
>>
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594
> In may case arc.c issuse cused by revision r286625 in HEAD (and
> r288562 in STABLE) -- all in 2015, not touch in 2014.
>
>> There are some ugly interactions with the VM system you can run into i=
f
>> you're not careful; I've chased this issue before and while I haven't
>> yet done the work to integrate it into 11.x (and the underlying code
>> *has* changed since the 10.x patches I developed) if you wind up drivi=
ng
>> the VM system to evict pages to swap rather than pare back ARC you're
>> probably making the wrong choice.
>>
>> In addition UMA can come into the picture too and (at least previously=
)
>> was a severe contributor to pathological behavior.
> I am only do less aggresive (and more controlled) shrink of ARC size.
> Now ARC just collapsed.
>
> Pointed PR is realy BIG. I am can't read and understund all of this.
> r286625 change behaivor of interaction between ARC and VM.
> You problem still exist? Can you explain (in list)?
>

Essentially ZFS is a "bolt-on" and unlike UFS which uses the unified
buffer cache (which the VM system manages) ZFS does not.  ARC is
allocated out of kernel memory and (by default) also uses UMA; the VM
system is not involved in its management.

When the VM system gets constrained (low memory) it thus cannot tell the
ARC to pare back.  So when the VM system gets low on RAM it will start
to page.  The problem with this is that if the VM system is low on RAM
because the ARC is consuming memory you do NOT want to page, you want to
evict some of the ARC.

Consider this: ARC data *at best* prevents one I/O.  That is, if there
is data in the cache when you go to read from disk, you avoid one I/O
per unit of data in the ARC you didn't have to read.

Paging *always* requires one I/O (to write the page(s) to the swap) and
MAY involve two (to later page it back in.)  It is never a "win" to
spend a *guaranteed* I/O when you can instead act in a way that *might*
cause you to (later) need to execute one.

Unfortunately the VM system has another interaction that causes trouble
too.  The VM system will "demote" a page to inactive or cache status but
not actually free it.  It only starts to go through those pages and free
them when the vm system wakes up, and that only happens when free space
gets low enough to trigger it.

Finally, there's another problem that comes into play; UMA.  Kernel
memory allocation is fairly expensive.  UMA grabs memory from the kernel
allocation system in big chunks and manages it, and by doing so gains a
pretty-significant performance boost.  But this means that you can have
large amounts of RAM that are allocated, not in use, and yet the VM
system cannot reclaim them on its own.  The ZFS code has to reap those
caches, but reaping them is a moderately expensive operation too, thus
you don't want to do it unnecessarily.

I've not yet gone through the 11.x code to see what changed from 10.x;
what I do know is that it is materially better-behaved than it used to
be, in that prior to 11.x I would have (by now) pretty much been forced
into rolling that forward and testing it because the misbehavior in one
of my production systems was severe enough to render it basically
unusable without the patch in that PR inline, with the most-serious
misbehavior being paging-induced stalls that could reach 10s of seconds
or more in duration.

11.x hasn't exhibited the severe problems, unpatched, that 10.x was
known to do on my production systems -- but it is far less than great in
that it sure as heck does have UMA coherence issues.....

ARC Size:                               38.58%  8.61    GiB
        Target Size: (Adaptive)         70.33%  15.70   GiB
        Min Size (Hard Limit):          12.50%  2.79    GiB
        Max Size (High Water):          8:1     22.32   GiB

I have 20GB out in kernel memory on this machine right now but only 8.6
of it in ARC; the rest is (mostly) sitting in UMA allocated-but-unused
-- so despite the belief expressed by some that the 11.x code is
"better" at reaping UMA I'm sure not seeing it here.

I'll get around to rolling forward and modifying that PR since that
particular bit of jackassery with UMA is a definite performance
problem.  I suspect a big part of what you're seeing lies there as
well.  When I do get that code done and tested I suspect it may solve
your problems as well.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms030104010004020408090706
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA4MTkyMDM4NTVaME8GCSqGSIb3DQEJBDFCBEBf
jToACzZ0aJsbs4kPgVo+9NmclF/Tf7IFMaWtJhWC9c6c69kh1e+sd08ixw/+69mWurKFLbsi
k0c2kARW6yWTMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAiRCmguHm
vmtO9kxmx0/QNeOjkaTT4SSzpqNuCXGfFc8z0DusMi0yV+d1qQ6+Kd7MGZKuyliV934suUt4
1f4Nr6F8A72QoHVYxFhO5FfkPqCogYMGgcViqaAh3AsHvr/lwHef1gGiUQS0rRVv61WYFacs
g7vSULg4J5WuwDAhnHoW2FJ5EFrHXZdCb4h0i7aFnDktrCasXowAy5HPxa52oF1zSFKtOP3Z
N1o3irux+R+ZvQPs5Bt31feVIevfaWHtdESwqgtalnGhpdgr4yFJldMIegJG3gNYc7+ChbH5
xLctVV6Du/PEZdcDUmq3jZjjAHAhqxcDmheRW+EXBbRbMTSHESx5hGIKYZ/7MrAfjC+MwN1W
JHCeLw9x7VV8ucmUje7X3Pb7VSStmfkSt3qgAUEQQDc/1BpBBLPVXSF5i62UqNC/4iL00/Qe
pQ9uNoBwVDIZik0Qmhb+Smeu6jEG88IBacqfNH+RoF9UPW0GgGh3k2un4jeaPHkljRQStydR
8RlOqflMEGG4LHX2v4YiYA+UHtxMV2pu4L+q26WsFEelD0byDenXX5aneKr0J62/Qo40NEc1
2lDDhPouaTPmYKNFjQX4pXKMTLo8hWSQEvcfwleZBBh0gmVoQNHDQP/wGGzSNkuq7gtEXs3l
i0kEDF41DkONr7NO1N3+mBYkFNYAAAAAAAA=
--------------ms030104010004020408090706--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bcb14d0b-bd6d-cb93-ea71-3656cfce8b3b>