FreeBSD Mail Archives

Date:      Wed, 22 May 2019 10:47:00 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-fs@freebsd.org
Subject:   Re: Commit r345200 (new ARC reclamation threads) looks suspicious to me - second potential problem
Message-ID:  <89064e9c-251a-a065-3a72-ac65c884d51d@denninger.net>
In-Reply-To: <28c7430b-fb7c-6472-5c1b-fa3ff63a9e73@FreeBSD.org>
References:  <369cb1e9-f36a-a558-6941-23b9b811825a@FreeBSD.org> <20190520164202.GA2130@spy> <28c7430b-fb7c-6472-5c1b-fa3ff63a9e73@FreeBSD.org>

index | next in thread | previous in thread | raw e-mail


[-- Attachment #1 --]
On 5/22/2019 10:19 AM, Alexander Motin wrote:
> On 20.05.2019 12:42, Mark Johnston wrote:
>> On Mon, May 20, 2019 at 07:05:07PM +0300, Lev Serebryakov wrote:
>>>   I'm looking at last commit to
>>> 'sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c' (r345200) and
>>> have another question.
>>>
>>>   Here are such code:
>>>
>>> 4960 	        /*
>>> 4961 	         * Kick off asynchronous kmem_reap()'s of all our caches.
>>> 4962 	         */
>>> 4963 	        arc_kmem_reap_soon();
>>> 4964 	
>>> 4965 	        /*
>>> 4966 	         * Wait at least arc_kmem_cache_reap_retry_ms between
>>> 4967 	         * arc_kmem_reap_soon() calls. Without this check it is
>>> possible to
>>> 4968 	         * end up in a situation where we spend lots of time reaping
>>> 4969 	         * caches, while we're near arc_c_min.  Waiting here also
>>> gives the
>>> 4970 	         * subsequent free memory check a chance of finding that the
>>> 4971 	         * asynchronous reap has already freed enough memory, and
>>> we don't
>>> 4972 	         * need to call arc_reduce_target_size().
>>> 4973 	         */
>>> 4974 	        delay((hz * arc_kmem_cache_reap_retry_ms + 999) / 1000);
>>> 4975 	
>>>
>>>   But looks like `arc_kmem_reap_soon()` is synchronous on FreeBSD! So,
>>> this `delay()` looks very wrong. Am I right?
> Why is it wrong?
>
>>>    Looks like it should be `#ifdef illumos`.
>> See also r338142, which I believe was reverted by the update.
> My r345200 indeed reverted that value, but I don't see a problem there.
> When OS need more RAM, pagedaemon will drain UMA caches by itself.  I
> don't see a point in re-draining UMA caches in a tight loop without
> delay.  If caches are not sufficient to sustain one second of workload,
> then usefulness of such caches is not very clear and shrinking ARC to
> free some space may be a right move.  Also making ZFS drain its caches
> more active then anything else in a system looks unfair to me.

There is a long-lasting pathology with the older implementation. The 
short answer is that if you have cache in UMA but unallocated to current 
working set it's completely wasted -- unless quickly re-used.  So a 
small buffer between current and allocation is ok, but the UMA system 
will leave large amounts out but unused. Reclaiming that after a 
reasonable amount of time is a very good thing.

The other problem is that disk cache should NEVER be preferred over 
working set space.  It's always wrong to do so because a working set 
page-out is 1 *guaranteed* I/O (to page it out) and possibly 2 I/Os (if 
required again and thus must be recalled) while a disk cache page is 1 
*possible* I/O avoided (if the disk cache block is requested again)

It is never the right move to intentionally take an I/O in order to 
avoid a *possible* I/O. Under certain workloads making that choice leads 
to severe pathological behavior (~30 second "pauses" where the system is 
doing I/O like crazy but a desired process -- such as a database, or 
shell, does nothing waiting on working set to be paged back in) when 
there are gigabytes (or 10s of gigabytes) of ARC outstanding.

-- 
-- Karl Denninger
/The Market-Ticker/
S/MIME Email accepted and preferred

[-- Attachment #2 --]
0�	*�H��
��0�10
	`�He0�	*�H��
��
�0��0����H���^��Ōc!5�
�H0
	*�H��
0��10	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA0
170817164217Z
270815164217Z0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0�"0
	*�H��
�0�
��h�-5B>[���;��o���l�Ӵ��0~͎O9}�9�Y��e������*�������$��g��!uk�vʶ�LzN�`jL�>��MD'7U4����5C�B�+�kY`bd����~b*�c3�N��y-�78j�u�]9H�e��uέ�sӬD��ؽ�m��gw�ER�?�&U�UR�j����'�}�9n�WD i�`XcbG��z�\g������G=��u�%���\�O�i1���3���ߝ4�
�K4�4p�YQr]�Ie�/r�0+��eEޝݖ0��C15�M��ݚ@J�SZ(zȏ�N�Ta�(2��5�D�D5���.l�<g[[Za��r�Q�Q%�Bu�ȴ����~~`���I�oh�R�b����ʳ��ڟ���u�2���M�S��8E�dF��UC���l�CM�aѳ����!����}ș�+�2��k��/�bų�E,��n�当ꖛ\�(8�WV�8	d]�b�	�������y�X��w	܊�:I�39��
0�0U]�^§������Q�\ӎ�0��U#��0�����T0�3���9�N0b������0��10	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA�	�@�U��i0U�0�0U��0
	*�H��
���:P U!>v�����J�ni��o�-����#�ן�]Wyu�j���ǑR̀��Q�
�nƇ�!GѦF��g\�yLx�g�w=�O�P��yceh�f[���}�ܷ�['4�ڝ�\[p6\o.��B&�JF���"�ZC{;�*o�*�mc��Cc�LY߾�`
�t�*�S!����񫶭�(���`�]D�HP�5���A~/�N���Pp�����6�=�m��h�k�밣'd���oA$�86hm����5���Ӛ��S@�j���ެE���gl��
�)�0JG���`%�k�3�5��P��a��C?���σ
׳HE�t}!�P���㏏%*���B�xb��Q�waKG����$6h�¦��M�v��e;��[o��-�Iی��&
���I,��T��c�ߎ#t �wPA�@��l0�P�+�KXB��պT	z���G�v;N��c��I3��&��JĬ���UP�N��a��?�/�%�W��6G۟N�0�00����k���#X��d��\�=0
	*�H��
0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0
170817212120Z
220816212120Z0W10	UUS10UFlorida10U
Cuda Systems LLC10Ukarl@denninger.net0�"0
	*�H��
�0�
��T��[I�-ΆϏdn;�Å�@שy���.u�s�~_�Z�G%<��M��Y��d�\g��v�f��n�s�a��1'6����E�gyjs�"C� [�{��~��_���K���Pn+<�*�pv���#Q�����+��H���/���7[-v��qD��V^U>�f��%�GX�)��H.��|l`�M(C�r�>е͇6����#�o��dc"Y�ljҦ�ln8�@�5S�A�0���&ۖ"���OGj?��U��DWZ5	��dDB7k-)�9�����I�zs��-�JA���v
��J��6L���$�Ն����1Sm�Y.��Lqw*��SH;E��F'�D�Ħ��H��]��M��O��������g���Q���Q�|M�ٙ��ג2Z��9y��@���y�]}6ٽe��Y9��Y2�xˆ�$T�=�e�CǺ��ǵb�n֛�{��j��|��@�LL�t�1�[D�k5:$=�	`�	�M���0��0<+00.0,+0� http://ocsp.cudasystems.net:88880	U00	`�H��B�0U��0U%0++03	`�H��B
&$OpenSSL Generated Client Certificate0U�%�՞V=���؁�;�bzQ0��U#��0���]�^§������Q�\ӎϡ�����0��10	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA��H���^��Ōc!5�
�H0U0�karl@denninger.net0
	*�H��
��۠�A0�-j%-�-$%���g2#ޡ��1�^��>���{K+�u��GE���v1���ş7Af&b�&O�;.��;A5���*U��)N��D2bF��|\=�]<�sˋL!��wrw���٧>��Y���M���Ä���3\mW�R�� h�Sv���!�_�zv�����l�?� ��3_�� �xU%�\�^����#���O*���Gk̍�YI_�&�Fꊛ�����@&�1�n�������}� ͬ:��{�hT�P3��B.�;���bU�8:Z��=^���Gw�8���!k-��@���x�E��@�i�,+'�Iᐚ:f��hz�tX7/�(h�Y`��� O�.������1}a`�%�RW��^�a�k������ǂp�C�Au�fgDix�UT��Щ/�7��}�%=j��nVZvcF����<�M=
�2^G�KH5魉
�_���O�4ެ�Byʈ���y��S��k�w=5�@h�.0�z�>�
W�1�0�0��0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA��k���#X��d��\�=0
	`�He��E0	*�H��
	1	*�H��
0	*�H��
	1
190522154700Z0O	*�H��
	1B@\����=7��R���i�kU�,/M{t��U�'Ps';�웦�	��Ο"�F��R�?gX�͚��0l	*�H��
	1_0]0	`�He*0	`�He0
*�H��
0*�H��
�0
*�H��
@0+0
*�H��
(0��	+�71��0��0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA��k���#X��d��\�=0��*�H��
	1�����0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA��k���#X��d��\�=0
	*�H��
�5A~��
���*S�}1×85{����F��4�5���%CVe�9���ʼ[���h~{?�W� 5���F��ن{�#Łt��`�F�0i�V��~k�z{��$����JV0aQ�C%J��3!�J���Q�ܽ"�h#�Te^��e鵓P� s�>&z���6����H�ɐk�s7��we%�0�Y$�w�W%=���ԈA�U�%�4�b�F�y�XyO��`�%>��Y;��f���s���T��z�@Z��`E;1F/
O������?�����>�Y��<��+�{����ωo�jd��F��5.n�ɽ������G��"�>L��(�ϗ5����n�/%I1��Y���
*x<i-�>���+�Z�a0����Fd�[��G��&pB�ǔF�'�\vf�YT�,II���`���K��v�k��̵�
A,,Y~7i�HS����w<y���PJ�נR���5���� j[}���@.g����V�9k���, C �N�X��d��7�����

help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?89064e9c-251a-a065-3a72-ac65c884d51d>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation