Date: Mon, 13 Jul 2015 10:29:40 -0700 From: Adrian Chadd <adrian@freebsd.org> To: Christopher Forgeron <csforgeron@gmail.com> Cc: FreeBSD Stable Mailing List <freebsd-stable@freebsd.org>, FreeBSD Filesystems <freebsd-fs@freebsd.org> Subject: Re: FreeBSD 10.1 Memory Exhaustion Message-ID: <CAJ-Vmom58SjgOG7HYPE4MVaB=XPaEkx_OTYgvOTHxwqGnTxtug@mail.gmail.com> In-Reply-To: <CAB2_NwCngPqFH4q-YZk00RO_aVF9JraeSsVX3xS0z5EV3YGa1Q@mail.gmail.com> References: <CAB2_NwCngPqFH4q-YZk00RO_aVF9JraeSsVX3xS0z5EV3YGa1Q@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
hi, With that much storage and that many snapshots, I do think you need more than 96GB of RAM in the box. I'm hoping someone doing active ZFS work can comment.. I don't think the ZFS code is completely "memory usage" safe. The "old" Sun suggestions when I started using ZFS was "if your server panics due to out of memory with ZFS, buy more memory." That said, there doesn't look like there's a leak anywhere - those dumps show you're using at least 32gig on each just in zfs data buffers. Try tuning the ARC down a little? -adrian On 13 July 2015 at 04:48, Christopher Forgeron <csforgeron@gmail.com> wrote= : > > > TL;DR Summary: I can run FreeBSD out of memory quite consistently, and it= =E2=80=99s > not a TOS/mbuf exhaustion issue. It=E2=80=99s quite possible that ZFS is = the > culprit, but shouldn=E2=80=99t the pager be able to handle aggressive mem= ory > requests in a low memory situation gracefully, without needing custom tun= ing > of ZFS / VM? > > > Hello, > > I=E2=80=99ve been dealing with some instability in my 10.1-RELEASE and > STABLEr282701M machines for the last few months. > > These machines are NFS/iSCSI storage machines, running on Dell M610x or > similar hardware, 96 Gig Memory, 10Gig Network Cards, dual Xeon Processor= s =E2=80=93 > Fairly beefy stuff. > > Initially I thought it was more issues with TOS / jumbo mbufs, as I had t= his > problem last year. I had thought that this was properly resolved, but > setting my MTU to 1500, and turning off TOS did give me a bit more > stability. Currently all my machines are set this way. > > Crashes were usually represented by loss of network connectivity, and the > ctld daemon scrolling messages across the screen at full speed about lost > connections. > > All of this did seem like more network stack problems, but with each cras= h > I=E2=80=99d be able to learn a bit more. > > Usually there was nothing of any use in the logfile, but every now and th= en > I=E2=80=99d get this: > > Jun 3 13:02:04 san0 kernel: WARNING: 172.16.0.97 > (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory > Jun 3 13:02:04 san0 kernel: WARNING: icl_pdu_new: failed to allocate 80 > bytes > Jun 3 13:02:04 san0 kernel: WARNING: 172.16.0.97 > (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory > Jun 3 13:02:04 san0 kernel: WARNING: icl_pdu_new: failed to allocate 80 > bytes > Jun 3 13:02:04 san0 kernel: WARNING: 172.16.0.97 > (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory > --------- > Jun 4 03:03:09 san0 kernel: WARNING: icl_pdu_new: failed to allocate 80 > bytes > Jun 4 03:03:09 san0 kernel: WARNING: icl_pdu_new: failed to allocate 80 > bytes > Jun 4 03:03:09 san0 kernel: WARNING: 172.16.0.97 > (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory > Jun 4 03:03:09 san0 kernel: WARNING: 172.16.0.97 > (iqn.1998-01.com.vmware:esx5a-3387a188): connection error; dropping > connection > Jun 4 03:03:09 san0 kernel: WARNING: 172.16.0.97 > (iqn.1998-01.com.vmware:esx5a-3387a188): connection error; dropping > connection > Jun 4 03:03:10 san0 kernel: WARNING: 172.16.0.97 > (iqn.1998-01.com.vmware:esx5a-3387a188): waiting for CTL to terminate tas= ks, > 1 remaining > Jun 4 06:04:27 san0 syslogd: kernel boot file is /boot/kernel/kernel > > So knowing that it seemed to be running out of memory, I started leaving > leaving =E2=80=98vmstat 5=E2=80=99 running on a console, to see what it w= as displaying > during the crash. > > It was always the same thing: > > 0 0 0 1520M 4408M 15 0 0 0 25 19 0 0 21962 1667 9139= 0 > 0 33 67 > 0 0 0 1520M 4310M 9 0 0 0 2 15 3 0 21527 1385 9516= 5 > 0 31 69 > 0 0 0 1520M 4254M 7 0 0 0 14 19 0 0 17664 1739 7287= 3 > 0 18 82 > 0 0 0 1520M 4145M 2 0 0 0 0 19 0 0 23557 1447 9694= 1 > 0 36 64 > 0 0 0 1520M 4013M 4 0 0 0 14 19 0 0 4288 490 34685= 0 > 72 28 > 0 0 0 1520M 3885M 2 0 0 0 0 19 0 0 11141 1038 6924= 2 > 0 52 48 > 0 0 0 1520M 3803M 10 0 0 0 14 19 0 0 24102 1834 9105= 0 > 0 33 67 > 0 0 0 1520M 8192B 2 0 0 0 2 15 1 0 19037 1131 7747= 0 > 0 45 55 > 0 0 0 1520M 8192B 0 22 0 0 2 0 6 0 146 82 578 = 0 > 0 100 > 0 0 0 1520M 8192B 1 0 0 0 0 0 0 0 130 40 510 = 0 > 0 100 > 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 143 40 501 = 0 > 0 100 > 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 201 62 660 = 0 > 0 100 > 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 101 28 404 = 0 > 0 100 > 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 97 27 398 = 0 > 0 100 > 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 93 28 377 = 0 > 0 100 > 0 0 0 1520M 8192B 0 0 0 0 0 0 0 0 92 27 373 = 0 > 0 100 > > > I=E2=80=99d go from a decent amount of free memory to suddenly having no= ne. Vmstat > would stop outputting, console commands would hang, etc. The whole system > would be useless. > > Looking into this, I came across a similar issue; > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D199189 > > I started increasing v.v_free_min, and it helped =E2=80=93 My crashes wen= t from > being ~every 6 hours to every few days. > > Currently I=E2=80=99m running with vm.v_free_min=3D1254507 =E2=80=93 That= =E2=80=99s (1254507 * 4KiB) , > or 4.78GiB of Reserve. The vmstat above is of a machine with that settin= g > still running to 8B of memory. > > I have two issues here: > > 1) I don=E2=80=99t think I should ever be able to run the system into the= ground on > memory. Deny me new memory until the pager can free more. > 2) Setting =E2=80=98min=E2=80=99 doesn=E2=80=99t really mean =E2=80=98min= =E2=80=99 as it can obviously go below that > threshold. > > > I have plenty of local UFS swap (non-ZFS drives) > > Adrian requested that I output a few more diagnostic items, and this is > what I=E2=80=99m running on a console now, in a loop: > > vmstat > netstat -m > vmstat -z > sleep 1 > > The output of four crashes are attached here, as they can be a bit long. = Let > me know if that=E2=80=99s not a good way to report them. They will each s= tart > mid-way through a vmstat =E2=80=93z output, as that=E2=80=99s as far back= as my terminal > buffer allows. > > > > Now, I have a good idea of the conditions that are causing this: ZFS > Snapshots, run by cron, during times of high ZFS writes. > > The crashes are all nearly on the hour, as that=E2=80=99s when crontab tr= iggers my > python scripts to make new snapshots, and delete old ones. > > My average FreeBSD machine has ~ 30 zfs datasets, with each pool having ~= 20 > TiB used. These all need to snapshot on the hour. > > By staggering the snapshots by a few minutes, I have been able to reduce > crashing from every other day to perhaps once a week if I=E2=80=99m lucky= =E2=80=93 But if I > start moving a lot of data around, I can cause daily crashes again. > > It=E2=80=99s looking to be the memory demand of snapshotting lots of ZFS = datasets at > the same time while accepting a lot of write traffic. > > Now perhaps the answer is =E2=80=98don=E2=80=99t do that=E2=80=99 but I f= eel that FreeBSD should be > robust enough to handle this. I don=E2=80=99t mind tuning for now to > reduce/eliminate this, but others shouldn=E2=80=99t run into this pain ju= st because > they heavily load their machines =E2=80=93 There must be a way of avoidin= g this > condition. > > Here are the contents of my /boot/loader.conf and sysctl.conf, so show my > minimal tuning to make this problem a little more bearable: > > /boot/loader.conf > vfs.zfs.arc_meta_limit=3D49656727553 > vfs.zfs.arc_max =3D 91489280512 > > /etc/sysctl.conf > vm.v_free_min=3D1254507 > > > Any suggestions/help is appreciated. > > Thank you. >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmom58SjgOG7HYPE4MVaB=XPaEkx_OTYgvOTHxwqGnTxtug>