Date: Tue, 14 Dec 2021 16:28:36 +0100 From: Fabian Keil <freebsd-listen@fabiankeil.de> To: freebsd-hackers@freebsd.org Subject: High mbuf count leading to processes getting killed for "lack of swap space" Message-ID: <20211214162836.3d3b9b87@fabiankeil.de>
next in thread | raw e-mail | index | archive | help
--Sig_/vN9fdp0HlM4NS1AijD3T1st Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable A while ago one of my physical systems running ElectroBSD based on FreeBSD stable/11 became occasionally unresponsive to network input and had to be power-cycled to get into a usable state again. It's conceivable that console access still would have worked but I didn't have console access for the system (and still don't). The problem was always preceded by "[zone: mbuf_cluster] kern.ipc.nmbclusters limit reached" messages and only occurred when the system was busy reproducing ElectroBSD or building ports with poudriere while additionally running the normal work load which includes serving web pages with nginx and relaying tor traffic. Some additional log messages and munin graphs from March are available at: <https://www.fabiankeil.de/blog-surrogat/2021/03/14/website-ausfall-durch-m= buf-cluster-limit.html> As a first step to diagnose the problem I added munin plugins for the mbuf state but unfortunately munin needs the network to work and thus is unreliable when mbufs are scarce ... While I'm generally not a big fan of cargo-cult administration I later on set kern.ipc.nmbclusters=3D1000000 while the auto-tuned value was "only" 247178 which is more than enough for the normal operation. This prevented the system which is now running ElectroBSD amd64 based on FreeBSD 12/stable from completely becoming unresponsive under load but unfortunately it also results in important processes getting killed with messages like: 2021-12-14T06:24:40.267263+01:00 elektrobier.fabiankeil.de kernel <3>1 - - = - pid 63731 (tor), jid 33, uid 256, was killed: out of swap space = =20 2021-12-14T06:24:41.000777+01:00 elektrobier.fabiankeil.de kernel <3>1 - - = - pid 91800 (tor), jid 37, uid 256, was killed: out of swap space = =20 2021-12-14T06:24:41.000862+01:00 elektrobier.fabiankeil.de kernel <3>1 - - = - pid 81324 (c++), jid 56, uid 1001, was killed: out of swap space = =20 2021-12-14T06:24:41.000903+01:00 elektrobier.fabiankeil.de kernel <5>1 - - = - Limiting closed port RST response from 17233 to 200 packets/sec = =20 2021-12-14T06:24:41.000917+01:00 elektrobier.fabiankeil.de kernel <3>1 - - = - pid 19764 (c++), jid 56, uid 1001, was killed: out of swap space = =20 2021-12-14T06:24:41.000954+01:00 elektrobier.fabiankeil.de kernel <5>1 - - = - Limiting closed port RST response from 1635 to 200 packets/sec = =20 2021-12-14T06:24:41.000967+01:00 elektrobier.fabiankeil.de kernel <5>1 - - = - Limiting closed port RST response from 1192 to 200 packets/sec = =20 2021-12-14T06:24:41.000980+01:00 elektrobier.fabiankeil.de kernel <3>1 - - = - pid 974 (xz), jid 0, uid 0, was killed: out of swap space = =20 2021-12-14T06:24:41.001016+01:00 elektrobier.fabiankeil.de kernel <5>1 - - = - Limiting closed port RST response from 441 to 200 packets/sec = =20 2021-12-14T06:24:41.001029+01:00 elektrobier.fabiankeil.de kernel <3>1 - - = - pid 10872 (perl), jid 0, uid 842, was killed: out of swap space = =20 2021-12-14T06:24:41.001065+01:00 elektrobier.fabiankeil.de kernel <5>1 - - = - Limiting closed port RST response from 1052 to 200 packets/sec = =20 2021-12-14T06:24:41.001078+01:00 elektrobier.fabiankeil.de kernel <3>1 - - = - pid 62569 (tor), jid 35, uid 256, was killed: out of swap space = =20 2021-12-14T06:24:41.001114+01:00 elektrobier.fabiankeil.de kernel <5>1 - - = - Limiting closed port RST response from 269 to 200 packets/sec = =20 My impression is that the system isn't actually out of swap space. The system has 4 GB of RAM and I temporarily increased the swap space from 8 GB to 16 GB which didn't make a difference. As far as munin is concerned the swap space isn't full in the time when munin is working: <https://www.fabiankeil.de/bilder/munin/mbuf-issues-2021-12-14/> As munin isn't reliable under load, I additionally let the system dump sysctls periodically and it looks like mbuf usage goes up to over 800000: [fk@elektrobier /var/log/sysctl-dumps]$ grep "mbufs in use" sysctl-dump-202= 1-12-14_0[56]\:* [...] sysctl-dump-2021-12-14_06:11:53.txt:829476/729/830205 mbufs in use (current= /cache/total)=20 sysctl-dump-2021-12-14_06:12:57.txt:831954/831/832785 mbufs in use (current= /cache/total)=20 sysctl-dump-2021-12-14_06:14:02.txt:834506/4/834510 mbufs in use (current/c= ache/total) =20 sysctl-dump-2021-12-14_06:15:11.txt:837446/814/838260 mbufs in use (current= /cache/total) =20 sysctl-dump-2021-12-14_06:16:19.txt:840177/948/841125 mbufs in use (current= /cache/total) =20 sysctl-dump-2021-12-14_06:17:26.txt:842652/603/843255 mbufs in use (current= /cache/total) =20 sysctl-dump-2021-12-14_06:22:31.txt:657/1293/1950 mbufs in use (current/cac= he/total) =20 sysctl-dump-2021-12-14_06:25:40.txt:528/1422/1950 mbufs in use (current/cac= he/total) =20 sysctl-dump-2021-12-14_06:26:40.txt:518/1432/1950 mbufs in use (current/cac= he/total) =20 sysctl-dump-2021-12-14_06:27:40.txt:517/1433/1950 mbufs in use (current/cac= he/total) =20 The sysctls were supposed to be dumped once per minute but apparently the system can't be trusted to do this under pressure either ... It's interesting to me that in the case above the mbuf usage went up but the mbuf cluster usage didn't go up as well. In the past both went up together (as shown in the munin graphs for the week). I'm wondering if killing processes is the best way to deal with the problem. I would prefer it, if the kernel would simply stop allocating new mbufs and mbuf clusters before memory becomes too scarce for the system to function. I'm aware that this would affect applications as well and would probably result in dropped connections, but my expectation would be that it would be less annoying than the whole system becoming unresponsive or important application getting killed and becoming unavailable until I can restart them. Has anyone already looked into this? Is there maybe a reason why stopping to allocate more mbufs and mbuf clusters than the system can handle isn't expected to work for reasons that aren't obvious to me? Fabian --Sig_/vN9fdp0HlM4NS1AijD3T1st Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQTKUNd6H/m3+ByGULIFiohV/3dUnQUCYbi4JQAKCRAFiohV/3dU ne8AAKDBqUxeqWgNygfcaO/QmWCC7D+bGwCdEqEXp1+KDG3hk5pGdlhPUF+GgjE= =wkEh -----END PGP SIGNATURE----- --Sig_/vN9fdp0HlM4NS1AijD3T1st--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20211214162836.3d3b9b87>