Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Jul 2024 11:47:39 +0300
From:      Odhiambo Washington <odhiambo@gmail.com>
To:        David Palma <david.palma@takinobori.com>
Cc:        questions <questions@freebsd.org>
Subject:   Re: Server became inaccessible because it ran out of swap space
Message-ID:  <CAAdA2WOs9rV4eshpcsuDS4yUejNVAs2H_5LhXqLSjHMLck4QXg@mail.gmail.com>
In-Reply-To: <8d2a864b-a2ad-48b7-9c52-32b2af3ceb79@takinobori.com>
References:  <CAAdA2WPSngEy4Dr4Yt8B7CHboHbxaYBaCpK2VZ%2BppB4fWYUX2g@mail.gmail.com> <8d2a864b-a2ad-48b7-9c52-32b2af3ceb79@takinobori.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000da8aba061c7c1f49
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, Jul 5, 2024 at 11:27=E2=80=AFAM David Palma <david.palma@takinobori=
.com>
wrote:

> Hi,
>
> On 05/07/2024 07:56, Odhiambo Washington wrote:
> > I have a server with 64GB RAM, 2CPUs each with 16 cores. I have also
> > configured 13GB or swap space.
> >
> > ```
> > root@gw:/usr/local/bhyve-vms/scripts # swapinfo
> > Device          1K-blocks     Used    Avail Capacity
> > /dev/ada0p3       3163136   703316  2459820    22%
> > /dev/md0.eli     10485760   709352  9776408     7%
> > Total            13648896  1412668 12236228    10%
> > root@gw:/usr/local/bhyve-vms/scripts #
> > ```
> >
> > A number of times it has become inaccessible until I do a hard reboot a=
nd
> > this has been caused by what I believe is running out of swap.
> >
> > Below is what I have obtained from /var/log/messages after I rebooted.
> >
> > How do I identify the culprit? Arrest the situation?
> >
> >
> > ```
> > Jul  5 06:50:56 gw kernel: failed
> > Jul  5 06:52:11 gw kernel: failed
> > Jul  5 06:52:11 gw kernel: out of swap space
> > Jul  5 06:52:11 gw kernel: failed
> > Jul  5 06:52:11 gw kernel: failed
> > Jul  5 06:52:12 gw kernel: failed
> > Jul  5 06:52:12 gw kernel: failed
> > Jul  5 06:54:06 gw kernel: out of swap space
> > Jul  5 06:54:06 gw kernel: failed
> > Jul  5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed:
> > failed to reclaim memory
> > Jul  5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed:
> > failed to reclaim memory
> > Jul  5 07:16:30 gw kernel: tap4: link state changed to DOWN
> > Jul  5 07:16:30 gw kernel: out of swap space
> > Jul  5 07:16:30 gw kernel: failed
> > Jul  5 07:16:30 gw kernel: failed
> > Jul  5 07:16:30 gw kernel: failed
> > Jul  5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed:
> > failed to reclaim memory
> > Jul  5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed:
> > failed to reclaim memory
> > Jul  5 07:16:30 gw kernel: tap5: link state changed to DOWN
> > Jul  5 07:16:30 gw kernel: failed
> > Jul  5 07:16:30 gw kernel: failed
> > Jul  5 07:16:30 gw kernel: sonewconn: pcb 0xfffff8002866d100
> > (local:/var/run/wsgi.38620.0.1.sock): Listen queue overflow: 151 alread=
y
> in
> > queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0
> > Jul  5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed:
> > failed to reclaim memory
> > Jul  5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed:
> > failed to reclaim memory
> > Jul  5 07:16:30 gw kernel: tap3: link state changed to DOWN
> > Jul  5 07:16:30 gw kernel: failed
> > Jul  5 07:16:30 gw kernel: out of swap space
> > Jul  5 07:16:30 gw kernel: failed
> > Jul  5 07:16:31 gw kernel: failed
> > Jul  5 07:16:31 gw kernel: failed
> > Jul  5 07:16:32 gw kernel: out of swap space
> > Jul  5 07:16:33 gw kernel: out of swap space
> > Jul  5 07:16:33 gw kernel: failed
> > Jul  5 07:16:33 gw kernel: failed
> > Jul  5 07:16:34 gw kernel: out of swap space
> > Jul  5 07:16:34 gw kernel: failed
> > Jul  5 07:16:36 gw kernel: failed
> > Jul  5 07:16:36 gw kernel: failed
> > Jul  5 07:16:36 gw kernel: failed
> > Jul  5 07:16:36 gw kernel: failed
> > Jul  5 07:16:36 gw kernel: failed
> > Jul  5 07:16:37 gw kernel: failed
> > Jul  5 07:16:37 gw kernel: failed
> > Jul  5 07:16:37 gw kernel: failed
> > Jul  5 07:16:37 gw kernel: failed
> > Jul  5 07:16:37 gw kernel: failed
> > Jul  5 07:16:37 gw kernel: failed
> > Jul  5 07:16:37 gw kernel: failed
> > Jul  5 07:16:38 gw kernel: failed
> > ```
> >
> >
>
> I'm not sure but looking at the bhyve processes being killed, it reminds
> of an earlier issue that was solved with:
>
> `vm.disable_swapspace_pageouts=3D1`
>
> Cheers,
> David
>

Hello David,

Thank you for this.

Let me enable this and monitor.


--=20
Best regards,
Odhiambo WASHINGTON,
Nairobi,KE
+254 7 3200 0004/+254 7 2274 3223
 In an Internet failure case, the #1 suspect is a constant: DNS.
"Oh, the cruft.", egrep -v '^$|^.*#' =C2=AF\_(=E3=83=84)_/=C2=AF :-)
[How to ask smart questions:
http://www.catb.org/~esr/faqs/smart-questions.html]

--000000000000da8aba061c7c1f49
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Fri, Jul 5, 2024 at 11:27=E2=80=AF=
AM David Palma &lt;<a href=3D"mailto:david.palma@takinobori.com">david.palm=
a@takinobori.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);pa=
dding-left:1ex">Hi,<br>
<br>
On 05/07/2024 07:56, Odhiambo Washington wrote:<br>
&gt; I have a server with 64GB RAM, 2CPUs each with 16 cores. I have also<b=
r>
&gt; configured 13GB or swap space.<br>
&gt; <br>
&gt; ```<br>
&gt; root@gw:/usr/local/bhyve-vms/scripts # swapinfo<br>
&gt; Device=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1K-blocks=C2=A0 =C2=A0 =C2=A0=
Used=C2=A0 =C2=A0 Avail Capacity<br>
&gt; /dev/ada0p3=C2=A0 =C2=A0 =C2=A0 =C2=A03163136=C2=A0 =C2=A0703316=C2=A0=
 2459820=C2=A0 =C2=A0 22%<br>
&gt; /dev/md0.eli=C2=A0 =C2=A0 =C2=A010485760=C2=A0 =C2=A0709352=C2=A0 9776=
408=C2=A0 =C2=A0 =C2=A07%<br>
&gt; Total=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 13648896=C2=A0 1412668 =
12236228=C2=A0 =C2=A0 10%<br>
&gt; root@gw:/usr/local/bhyve-vms/scripts #<br>
&gt; ```<br>
&gt; <br>
&gt; A number of times it has become inaccessible until I do a hard reboot =
and<br>
&gt; this has been caused by what I believe is running out of swap.<br>
&gt; <br>
&gt; Below is what I have obtained from /var/log/messages after I rebooted.=
<br>
&gt; <br>
&gt; How do I identify the culprit? Arrest the situation?<br>
&gt; <br>
&gt; <br>
&gt; ```<br>
&gt; Jul=C2=A0 5 06:50:56 gw kernel: failed<br>
&gt; Jul=C2=A0 5 06:52:11 gw kernel: failed<br>
&gt; Jul=C2=A0 5 06:52:11 gw kernel: out of swap space<br>
&gt; Jul=C2=A0 5 06:52:11 gw kernel: failed<br>
&gt; Jul=C2=A0 5 06:52:11 gw kernel: failed<br>
&gt; Jul=C2=A0 5 06:52:12 gw kernel: failed<br>
&gt; Jul=C2=A0 5 06:52:12 gw kernel: failed<br>
&gt; Jul=C2=A0 5 06:54:06 gw kernel: out of swap space<br>
&gt; Jul=C2=A0 5 06:54:06 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was ki=
lled:<br>
&gt; failed to reclaim memory<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was ki=
lled:<br>
&gt; failed to reclaim memory<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: tap4: link state changed to DOWN<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: out of swap space<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was k=
illed:<br>
&gt; failed to reclaim memory<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was k=
illed:<br>
&gt; failed to reclaim memory<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: tap5: link state changed to DOWN<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: sonewconn: pcb 0xfffff8002866d100<br>
&gt; (local:/var/run/wsgi.38620.0.1.sock): Listen queue overflow: 151 alrea=
dy in<br>
&gt; queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was ki=
lled:<br>
&gt; failed to reclaim memory<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was ki=
lled:<br>
&gt; failed to reclaim memory<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: tap3: link state changed to DOWN<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: out of swap space<br>
&gt; Jul=C2=A0 5 07:16:30 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:31 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:31 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:32 gw kernel: out of swap space<br>
&gt; Jul=C2=A0 5 07:16:33 gw kernel: out of swap space<br>
&gt; Jul=C2=A0 5 07:16:33 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:33 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:34 gw kernel: out of swap space<br>
&gt; Jul=C2=A0 5 07:16:34 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:36 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:36 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:36 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:36 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:36 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:37 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:37 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:37 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:37 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:37 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:37 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:37 gw kernel: failed<br>
&gt; Jul=C2=A0 5 07:16:38 gw kernel: failed<br>
&gt; ```<br>
&gt; <br>
&gt; <br>
<br>
I&#39;m not sure but looking at the bhyve processes being killed, it remind=
s <br>
of an earlier issue that was solved with:<br>
<br>
`vm.disable_swapspace_pageouts=3D1`<br>
<br>
Cheers,<br>
David<br></blockquote><div><br></div><div>Hello David,</div><div><br></div>=
<div>Thank you for this.</div><div><br></div><div>Let me enable this and mo=
nitor.=C2=A0</div></div><br clear=3D"all"><div><br></div><span class=3D"gma=
il_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signatur=
e"><div dir=3D"ltr"><div dir=3D"ltr"><div>Best regards,<br>Odhiambo WASHING=
TON,<br>Nairobi,KE<br>+254 7 3200 0004/+254 7 2274 3223</div><div><span sty=
le=3D"color:rgb(34,34,34)">=C2=A0In=C2=A0</span><span style=3D"color:rgb(34=
,34,34)">an Internet failure case, the #1 suspect is a constant: DNS.</span=
><br>&quot;<span style=3D"font-size:12.8px">Oh, the cruft.</span><span styl=
e=3D"font-size:12.8px">&quot;,=C2=A0</span><span style=3D"font-size:12.8px"=
>egrep -v &#39;^$|^.*#&#39;=C2=A0</span><span style=3D"background-color:rgb=
(34,34,34);color:rgb(238,238,238);font-family:&quot;Lucida Console&quot;,Co=
nsolas,&quot;Courier New&quot;,monospace;font-size:13.6px">=C2=AF\_(=E3=83=
=84)_/=C2=AF</span><span style=3D"font-size:12.8px">=C2=A0:-)</span></div><=
div><span style=3D"font-size:12.8px">[How to ask smart questions:=C2=A0</sp=
an><span style=3D"font-size:12.8px"><a href=3D"http://www.catb.org/~esr/faq=
s/smart-questions.html" target=3D"_blank">http://www.catb.org/~esr/faqs/sma=
rt-questions.html</a>]</span></div></div></div></div></div>

--000000000000da8aba061c7c1f49--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAAdA2WOs9rV4eshpcsuDS4yUejNVAs2H_5LhXqLSjHMLck4QXg>