FreeBSD Mail Archives

Date:      Mon, 13 Jul 2015 06:58:56 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-fs@freebsd.org
Subject:   Re: FreeBSD 10.1 Memory Exhaustion
Message-ID:  <55A3A800.5060904@denninger.net>
In-Reply-To: <CAB2_NwCngPqFH4q-YZk00RO_aVF9JraeSsVX3xS0z5EV3YGa1Q@mail.gmail.com>

index | next in thread | previous in thread | raw e-mail


[-- Attachment #1 --]
Put this on your box and see if the problem goes away.... :-)

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594

The 2015-02-10 refactor will apply against 10.1-STABLE and 10.2-PRE (the
latter will give you a 10-line fuzz in one block but applies and works.)

I've been unable to provoke misbehavior with this patch in and I run a
cron job that does auto-snapshotting.  There are others that have run
this patch with similarly positive results.

On 7/13/2015 06:48, Christopher Forgeron wrote:
> TL;DR Summary: I can run FreeBSD out of memory quite consistently, and it�s
> not a TOS/mbuf exhaustion issue. It�s quite possible that ZFS is the
> culprit, but shouldn�t the pager be able to handle aggressive memory
> requests in a low memory situation gracefully, without needing custom
> tuning of ZFS / VM?
>
>
> Hello,
>
> I�ve been dealing with some instability in my 10.1-RELEASE and
> STABLEr282701M machines for the last few months.
>
> These machines are NFS/iSCSI storage machines, running on Dell M610x or
> similar hardware, 96 Gig Memory, 10Gig Network Cards, dual Xeon Processors
> � Fairly beefy stuff.
>
> Initially I thought it was more issues with TOS / jumbo mbufs, as I had
> this problem last year. I had thought that this was properly resolved, but
> setting my MTU to 1500, and turning off TOS did give me a bit more
> stability. Currently all my machines are set this way.
>
> Crashes were usually represented by loss of network connectivity, and the
> ctld daemon scrolling messages across the screen at full speed about lost
> connections.
>
> All of this did seem like more network stack problems, but with each crash
> I�d be able to learn a bit more.
>
> Usually there was nothing of any use in the logfile, but every now and then
> I�d get this:
>
> Jun  3 13:02:04 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory
> Jun  3 13:02:04 san0 kernel: WARNING: icl_pdu_new: failed to allocate 80
> bytes
> Jun  3 13:02:04 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory
> Jun  3 13:02:04 san0 kernel: WARNING: icl_pdu_new: failed to allocate 80
> bytes
> Jun  3 13:02:04 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory
> ---------
> Jun  4 03:03:09 san0 kernel: WARNING: icl_pdu_new: failed to allocate 80
> bytes
> Jun  4 03:03:09 san0 kernel: WARNING: icl_pdu_new: failed to allocate 80
> bytes
> Jun  4 03:03:09 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory
> Jun  4 03:03:09 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): connection error; dropping
> connection
> Jun  4 03:03:09 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): connection error; dropping
> connection
> Jun  4 03:03:10 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): waiting for CTL to terminate
> tasks, 1 remaining
> Jun  4 06:04:27 san0 syslogd: kernel boot file is /boot/kernel/kernel
>
> So knowing that it seemed to be running out of memory, I started leaving
> leaving �vmstat 5� running on a console, to see what it was displaying
> during the crash.
>
> It was always the same thing:
>
>  0 0 0   1520M  4408M    15   0   0   0    25  19   0   0 21962 1667 91390
>  0 33 67
>  0 0 0   1520M  4310M     9   0   0   0     2  15   3   0 21527 1385 95165
>  0 31 69
>  0 0 0   1520M  4254M     7   0   0   0    14  19   0   0 17664 1739 72873
>  0 18 82
>  0 0 0   1520M  4145M     2   0   0   0     0  19   0   0 23557 1447 96941
>  0 36 64
>  0 0 0   1520M  4013M     4   0   0   0    14  19   0   0 4288  490 34685
>  0 72 28
>  0 0 0   1520M  3885M     2   0   0   0     0  19   0   0 11141 1038 69242
>  0 52 48
>  0 0 0   1520M  3803M    10   0   0   0    14  19   0   0 24102 1834 91050
>  0 33 67
>  0 0 0   1520M  8192B     2   0   0   0     2  15   1   0 19037 1131 77470
>  0 45 55
>  0 0 0   1520M  8192B     0  22   0   0     2   0   6   0  146   82  578  0
>  0 100
>  0 0 0   1520M  8192B     1   0   0   0     0   0   0   0  130   40  510  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0  143   40  501  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0  201   62  660  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0  101   28  404  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0   97   27  398  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0   93   28  377  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0   92   27  373  0
>  0 100
>
>
>  I�d go from a decent amount of free memory to suddenly having none. Vmstat
> would stop outputting, console commands would hang, etc. The whole system
> would be useless.
>
> Looking into this, I came across a similar issue;
>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199189
>
> I started increasing v.v_free_min, and it helped � My crashes went from
> being ~every 6 hours to every few days.
>
> Currently I�m running with vm.v_free_min=1254507 � That�s (1254507 * 4KiB)
> , or 4.78GiB of Reserve.  The vmstat above is of a machine with that
> setting still running to 8B of memory.
>
> I have two issues here:
>
> 1) I don�t think I should ever be able to run the system into the ground on
> memory. Deny me new memory until the pager can free more.
> 2) Setting �min� doesn�t really mean �min� as it can obviously go below
> that threshold.
>
>
> I have plenty of local UFS swap (non-ZFS drives)
>
>  Adrian requested that I output a few more diagnostic items, and this is
> what I�m running on a console now, in a loop:
>
>         vmstat
>         netstat -m
>         vmstat -z
>         sleep 1
>
> The output of four crashes are attached here, as they can be a bit long.
> Let me know if that�s not a good way to report them. They will each start
> mid-way through a vmstat �z output, as that�s as far back as my terminal
> buffer allows.
>
>
>
> Now, I have a good idea of the conditions that are causing this: ZFS
> Snapshots, run by cron, during times of high ZFS writes.
>
> The crashes are all nearly on the hour, as that�s when crontab triggers my
> python scripts to make new snapshots, and delete old ones.
>
> My average FreeBSD machine has ~ 30 zfs datasets, with each pool having ~20
> TiB used. These all need to snapshot on the hour.
>
> By staggering the snapshots by a few minutes, I have been able to reduce
> crashing from every other day to perhaps once a week if I�m lucky � But if
> I start moving a lot of data around, I can cause daily crashes again.
>
> It�s looking to be the memory demand of snapshotting lots of ZFS datasets
> at the same time while accepting a lot of write traffic.
>
> Now perhaps the answer is �don�t do that� but I feel that FreeBSD should be
> robust enough to handle this. I don�t mind tuning for now to
> reduce/eliminate this, but others shouldn�t run into this pain just because
> they heavily load their machines � There must be a way of avoiding this
> condition.
>
> Here are the contents of my /boot/loader.conf and sysctl.conf, so show my
> minimal tuning to make this problem a little more bearable:
>
> /boot/loader.conf
> vfs.zfs.arc_meta_limit=49656727553
> vfs.zfs.arc_max = 91489280512
>
> /etc/sysctl.conf
> vm.v_free_min=1254507
>
>
> Any suggestions/help is appreciated.
>
> Thank you.
>
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

-- 
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

[-- Attachment #2 --]
0�	*�H��
��0�10	+0�	*�H��
��_0�[0�C�)0
	*�H��
0��10	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*�H��
	Cuda Systems LLC CA0
150421022159Z
200419022159Z0Z10	UUS10UFlorida10U
Cuda Systems LLC10UKarl Denninger (OCSP)0�"0
	*�H��
�0�
���X�@v�kY�
�T��q/v�E�]��5#�֯�MX����\8��L�J/V?�5���Da�+
��sJ��c�*��/�r�{ȼ�n��S�+�w"�)���ąZ^�Dt��dC�OZ�� ~7��Q ��'��@���a#i�j�c۴oZdB&���!�Ӝ���-�<	�?��H���N���5���y
5�}F�|ef゘��"V��لi��o��7��4���zn�">����a����1q�Wuɖ�b�F��e�GE�&�3(��K�h����ix�G�3���!��#�e_X�Ƭ����Ϝ/,��$�+�;�4y��'�B�z<qT�9����_?rRU�pn�5
��Jn&R��x/�p J�yel�*�pN�8�/#�9�u����/��YP�E��C)T����Y>��~/˘N[������v��yi���DKˉ�,�^�"� ?�$��T8����v�&�����K�%z��8�C @?�K{�9�f`��+���@,|����M��bia���0��07++0)0'+0�http://cudasystems.net:88880	U00	`�H��B�0U�0,	`�H��B
OpenSSL Generated Certificate0U��-��h\F����f ��Y0U#0�$q���}��ݽ�ʒ����m50U0�karl@denninger.net0
	*�H��
�Ow�b��a�bɺx�&��Uk�[��(O�j��!����%�p��MQ�0I�!#�Q��H��}.>~2���&D}�<�wm_>�V6�v��]�f��>=�N�n�+8;q �wfΰ����/��R�LyU��G#�b�}n�!D����ր_��up�|��_�ǰ��c��/�%ۥ���
�nN8:�d��;�-�UJ��d/�m��1~Vނי���nN I˾$t�F1&}�|?q?�\đ�X��ԑ�&\�4V�<lK������ۮ3%Am_����(��q����-(c����Ae�G�X)f}�-˥6c��v~��K�g�8m~v��;|�9�:-i�A����P��қ�6�ېn�-���.)�<[�$KJ�t�����t/L4�ᖣ�^Cm�u4v�b{+�B�G�$M���0c�\��[M�R�|�0FԸ�����P&7����8�"4p������#���}��DZ�9;V�9�#>�S�w"��[�UP7�1��0��0��0��10	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*�H��
	Cuda Systems LLC CA)0	+��!0	*�H��
	1	*�H��
0	*�H��
	1
150713115856Z0#	*�H��
	1���jV��w�f����\Y7��0l	*�H��
	1_0]0	`�He*0	`�He0
*�H��
0*�H��
�0
*�H��
@0+0
*�H��
(0��	+�71��0��0��10	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*�H��
	Cuda Systems LLC CA)0��*�H��
	1�����0��10	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems LLC CA1"0 	*�H��
	Cuda Systems LLC CA)0
	*�H��
�Y�>�g���В��5�u`pWFYV�*��4L��,�
 ��h�c-��;ܵ�t��Z���u�r���?��ƌ�avc�*
�Vd@f��$���a�b?;�8�8%0��j=�N��ɏ�NtX�X͎�!�>����A��u[�m�f��K4|+||`R���A|�J���<D-E$ꐯ��
D�u��YSJP��MC�� T��?=���L}���ZzmZ=w/�v�w��d9�B+
���t�B���Z���5n	&�}�?l�;����@A���>X��Djz
DjW8�=��CX���K�s�Vh�=5G����-j����w�	4�c�(�M���U�y<Y��M����; ��1I��Z�ڌţ�a�bTIsT�-��Ct����lɊ��o�V2���L<'�L����.��
in`,�yC�Ԇ.�;��]<��(��sU���	Ż�l��UbUt@a��I��b���`o�򇉡�	���Gl��պ?����GW7�rW�2QAX#9�,+�9�l�K��

help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55A3A800.5060904>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation