Date: Fri, 5 Jul 2024 11:47:39 +0300 From: Odhiambo Washington <odhiambo@gmail.com> To: David Palma <david.palma@takinobori.com> Cc: questions <questions@freebsd.org> Subject: Re: Server became inaccessible because it ran out of swap space Message-ID: <CAAdA2WOs9rV4eshpcsuDS4yUejNVAs2H_5LhXqLSjHMLck4QXg@mail.gmail.com> In-Reply-To: <8d2a864b-a2ad-48b7-9c52-32b2af3ceb79@takinobori.com> References: <CAAdA2WPSngEy4Dr4Yt8B7CHboHbxaYBaCpK2VZ%2BppB4fWYUX2g@mail.gmail.com> <8d2a864b-a2ad-48b7-9c52-32b2af3ceb79@takinobori.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On Fri, Jul 5, 2024 at 11:27 AM David Palma <david.palma@takinobori.com> wrote: > Hi, > > On 05/07/2024 07:56, Odhiambo Washington wrote: > > I have a server with 64GB RAM, 2CPUs each with 16 cores. I have also > > configured 13GB or swap space. > > > > ``` > > root@gw:/usr/local/bhyve-vms/scripts # swapinfo > > Device 1K-blocks Used Avail Capacity > > /dev/ada0p3 3163136 703316 2459820 22% > > /dev/md0.eli 10485760 709352 9776408 7% > > Total 13648896 1412668 12236228 10% > > root@gw:/usr/local/bhyve-vms/scripts # > > ``` > > > > A number of times it has become inaccessible until I do a hard reboot and > > this has been caused by what I believe is running out of swap. > > > > Below is what I have obtained from /var/log/messages after I rebooted. > > > > How do I identify the culprit? Arrest the situation? > > > > > > ``` > > Jul 5 06:50:56 gw kernel: failed > > Jul 5 06:52:11 gw kernel: failed > > Jul 5 06:52:11 gw kernel: out of swap space > > Jul 5 06:52:11 gw kernel: failed > > Jul 5 06:52:11 gw kernel: failed > > Jul 5 06:52:12 gw kernel: failed > > Jul 5 06:52:12 gw kernel: failed > > Jul 5 06:54:06 gw kernel: out of swap space > > Jul 5 06:54:06 gw kernel: failed > > Jul 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: tap4: link state changed to DOWN > > Jul 5 07:16:30 gw kernel: out of swap space > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: tap5: link state changed to DOWN > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: sonewconn: pcb 0xfffff8002866d100 > > (local:/var/run/wsgi.38620.0.1.sock): Listen queue overflow: 151 already > in > > queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0 > > Jul 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: tap3: link state changed to DOWN > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: out of swap space > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:31 gw kernel: failed > > Jul 5 07:16:31 gw kernel: failed > > Jul 5 07:16:32 gw kernel: out of swap space > > Jul 5 07:16:33 gw kernel: out of swap space > > Jul 5 07:16:33 gw kernel: failed > > Jul 5 07:16:33 gw kernel: failed > > Jul 5 07:16:34 gw kernel: out of swap space > > Jul 5 07:16:34 gw kernel: failed > > Jul 5 07:16:36 gw kernel: failed > > Jul 5 07:16:36 gw kernel: failed > > Jul 5 07:16:36 gw kernel: failed > > Jul 5 07:16:36 gw kernel: failed > > Jul 5 07:16:36 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:38 gw kernel: failed > > ``` > > > > > > I'm not sure but looking at the bhyve processes being killed, it reminds > of an earlier issue that was solved with: > > `vm.disable_swapspace_pageouts=1` > > Cheers, > David > Hello David, Thank you for this. Let me enable this and monitor. -- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 In an Internet failure case, the #1 suspect is a constant: DNS. "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-) [How to ask smart questions: http://www.catb.org/~esr/faqs/smart-questions.html] [-- Attachment #2 --] <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jul 5, 2024 at 11:27 AM David Palma <<a href="mailto:david.palma@takinobori.com">david.palma@takinobori.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br> <br> On 05/07/2024 07:56, Odhiambo Washington wrote:<br> > I have a server with 64GB RAM, 2CPUs each with 16 cores. I have also<br> > configured 13GB or swap space.<br> > <br> > ```<br> > root@gw:/usr/local/bhyve-vms/scripts # swapinfo<br> > Device 1K-blocks Used Avail Capacity<br> > /dev/ada0p3 3163136 703316 2459820 22%<br> > /dev/md0.eli 10485760 709352 9776408 7%<br> > Total 13648896 1412668 12236228 10%<br> > root@gw:/usr/local/bhyve-vms/scripts #<br> > ```<br> > <br> > A number of times it has become inaccessible until I do a hard reboot and<br> > this has been caused by what I believe is running out of swap.<br> > <br> > Below is what I have obtained from /var/log/messages after I rebooted.<br> > <br> > How do I identify the culprit? Arrest the situation?<br> > <br> > <br> > ```<br> > Jul 5 06:50:56 gw kernel: failed<br> > Jul 5 06:52:11 gw kernel: failed<br> > Jul 5 06:52:11 gw kernel: out of swap space<br> > Jul 5 06:52:11 gw kernel: failed<br> > Jul 5 06:52:11 gw kernel: failed<br> > Jul 5 06:52:12 gw kernel: failed<br> > Jul 5 06:52:12 gw kernel: failed<br> > Jul 5 06:54:06 gw kernel: out of swap space<br> > Jul 5 06:54:06 gw kernel: failed<br> > Jul 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed:<br> > failed to reclaim memory<br> > Jul 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed:<br> > failed to reclaim memory<br> > Jul 5 07:16:30 gw kernel: tap4: link state changed to DOWN<br> > Jul 5 07:16:30 gw kernel: out of swap space<br> > Jul 5 07:16:30 gw kernel: failed<br> > Jul 5 07:16:30 gw kernel: failed<br> > Jul 5 07:16:30 gw kernel: failed<br> > Jul 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed:<br> > failed to reclaim memory<br> > Jul 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed:<br> > failed to reclaim memory<br> > Jul 5 07:16:30 gw kernel: tap5: link state changed to DOWN<br> > Jul 5 07:16:30 gw kernel: failed<br> > Jul 5 07:16:30 gw kernel: failed<br> > Jul 5 07:16:30 gw kernel: sonewconn: pcb 0xfffff8002866d100<br> > (local:/var/run/wsgi.38620.0.1.sock): Listen queue overflow: 151 already in<br> > queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0<br> > Jul 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed:<br> > failed to reclaim memory<br> > Jul 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed:<br> > failed to reclaim memory<br> > Jul 5 07:16:30 gw kernel: tap3: link state changed to DOWN<br> > Jul 5 07:16:30 gw kernel: failed<br> > Jul 5 07:16:30 gw kernel: out of swap space<br> > Jul 5 07:16:30 gw kernel: failed<br> > Jul 5 07:16:31 gw kernel: failed<br> > Jul 5 07:16:31 gw kernel: failed<br> > Jul 5 07:16:32 gw kernel: out of swap space<br> > Jul 5 07:16:33 gw kernel: out of swap space<br> > Jul 5 07:16:33 gw kernel: failed<br> > Jul 5 07:16:33 gw kernel: failed<br> > Jul 5 07:16:34 gw kernel: out of swap space<br> > Jul 5 07:16:34 gw kernel: failed<br> > Jul 5 07:16:36 gw kernel: failed<br> > Jul 5 07:16:36 gw kernel: failed<br> > Jul 5 07:16:36 gw kernel: failed<br> > Jul 5 07:16:36 gw kernel: failed<br> > Jul 5 07:16:36 gw kernel: failed<br> > Jul 5 07:16:37 gw kernel: failed<br> > Jul 5 07:16:37 gw kernel: failed<br> > Jul 5 07:16:37 gw kernel: failed<br> > Jul 5 07:16:37 gw kernel: failed<br> > Jul 5 07:16:37 gw kernel: failed<br> > Jul 5 07:16:37 gw kernel: failed<br> > Jul 5 07:16:37 gw kernel: failed<br> > Jul 5 07:16:38 gw kernel: failed<br> > ```<br> > <br> > <br> <br> I'm not sure but looking at the bhyve processes being killed, it reminds <br> of an earlier issue that was solved with:<br> <br> `vm.disable_swapspace_pageouts=1`<br> <br> Cheers,<br> David<br></blockquote><div><br></div><div>Hello David,</div><div><br></div><div>Thank you for this.</div><div><br></div><div>Let me enable this and monitor. </div></div><br clear="all"><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div dir="ltr"><div>Best regards,<br>Odhiambo WASHINGTON,<br>Nairobi,KE<br>+254 7 3200 0004/+254 7 2274 3223</div><div><span style="color:rgb(34,34,34)"> In </span><span style="color:rgb(34,34,34)">an Internet failure case, the #1 suspect is a constant: DNS.</span><br>"<span style="font-size:12.8px">Oh, the cruft.</span><span style="font-size:12.8px">", </span><span style="font-size:12.8px">egrep -v '^$|^.*#' </span><span style="background-color:rgb(34,34,34);color:rgb(238,238,238);font-family:"Lucida Console",Consolas,"Courier New",monospace;font-size:13.6px">¯\_(ツ)_/¯</span><span style="font-size:12.8px"> :-)</span></div><div><span style="font-size:12.8px">[How to ask smart questions: </span><span style="font-size:12.8px"><a href="http://www.catb.org/~esr/faqs/smart-questions.html" target="_blank">http://www.catb.org/~esr/faqs/smart-questions.html</a>]</span></div></div></div></div></div>help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAAdA2WOs9rV4eshpcsuDS4yUejNVAs2H_5LhXqLSjHMLck4QXg>
