Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Sep 2017 01:07:29 +0000
From:      Zhixin Wan <Zhixin.Wan@watchguard.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   RE: OOM-killer can't work on FreeBSD 11.0
Message-ID:  <DM5PR10MB17546B2183F0C958992D84E4EB690@DM5PR10MB1754.namprd10.prod.outlook.com>
In-Reply-To: <20170911080836.GB6477@kib.kiev.ua>
References:  <DM5PR10MB1754328DAF0A05B39D4B43FBEB680@DM5PR10MB1754.namprd10.prod.outlook.com> <20170911080836.GB6477@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks!=20
I will try to tune this sysctl, let's see what to happen.

-----Original Message-----
From: Konstantin Belousov [mailto:kostikbel@gmail.com]=20
Sent: Monday, September 11, 2017 16:09
To: Zhixin Wan <Zhixin.Wan@watchguard.com>
Cc: freebsd-hackers@freebsd.org
Subject: Re: OOM-killer can't work on FreeBSD 11.0

On Mon, Sep 11, 2017 at 03:56:45AM +0000, Zhixin Wan via freebsd-hackers wr=
ote:
> Hi,
>=20
> I have a mail system running FreeBSD 9.3 which is put on VMWare ESXi, it'=
s assigned a low memory (1G or 2G) and a reasonable swap disk size (2 x Mem=
ory size).
> The mail system was running for several years, and didn't see any freeze =
even a lot of mail traffic through it.
>=20
> Recently I upgraded this mail system from FreeBSD 9.3 to FreeBSD 11.0,=20
> and after running a few days, the mail system got freeze. I can't get any=
 response from the console, and can't login to the mail system with SSH eit=
her, except ping to the system got response. I look into the message log an=
d found a lot of messages:
>=20
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(4): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(4): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(5): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(4): failed
> swap_pager: out of swap space
> swap_pager_getswapspace(1): failed
> swap_pager_getswapspace(16): failed
> swap_pager_getswapspace(12): failed
> swap_pager_getswapspace(9): failed
> swap_pager_getswapspace(16): failed
> ...
>=20
> It seems that the out of swap cause the system freeze.
>=20
> To figure out this problem, restore the mail system to previous backup sn=
apshot which is running on FreeBSD 9.3.
> Put mail traffic pressure on the mail system, and observe the memory and =
swap space usage with a simple shell:
>=20
> #!/bin/sh
> while [ 1 ]; do
> vmstat
> pstat -s
> sleep 60
> done
>=20
> >From the console, I saw the memory and swap space usage increased=20
> >quickly. Once the swap space was eat out,
> out of swap messages will be shown in message log:
>=20
> swap_pager_getswapspace(4): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(3): failed
> swap_pager_getswapspace(4): failed
> swap_pager_getswapspace(6): failed
> swap_pager_getswapspace(2): failed
> swap_pager_getswapspace(2): failed
> swap_pager_getswapspace(2): failed
> swap_pager_getswapspace(5): failed
> swap_pager_getswapspace(8): failed
> swap_pager_getswapspace(2): failed
> swap_pager_getswapspace(4): failed
> Sep 6 08:30:58 mail-system kernel: pid 92324 (bm_scanner), uid 5500,=20
> was killed: out of swap space
>=20
> Compared to FreeBSD 11.0, there are still a lot of "swap_pager_getswapspa=
ce failed" messages, except FreeBSD 9.3 will kill a process to free memory.
> This behavior cause the mail system can go on running, but FreeBSD=20
> 11.0 can't. Observe the system memory and swap space usage continuously, =
the OOM-killer works accurately: once the swap space usage is 100%, the OOM=
-killer will be called to kill a process to free memory.
No, this is not the right behaviour. Filling up the swap space must not cau=
se the OOM to trigger (in the default setup of swap overcommit turned off).

>=20
> Dig into the source code of FreeBSD 9.3, file vm_pageout.c, function vm_p=
ageout_scan():
>                 /*
>                 * If we are critically low on one of RAM or swap and low =
on
>                 * the other, kill the largest process.  However, we avoid
>                 * doing this on the first pass in order to give ourselves=
 a
>                 * chance to flush out dirty vnode-backed pages and to all=
ow
>                 * active pages to be moved to the inactive queue and recl=
aimed.
>                 */
>                 if (pass !=3D 0 &&
>                     ((swap_pager_avail < 64 && vm_page_count_min()) ||
>                      (swap_pager_full && vm_paging_target() > 0)))
>                                 vm_pageout_oom(VM_OOM_MEM);
>=20
> the corresponding source code in FreeBSD 11.0, file vm_pageout.c, functio=
n vm_pageout_scan():
>         /*
>          * If the inactive queue scan fails repeatedly to meet its
>          * target, kill the largest process.
>          */
>         vm_pageout_mightbe_oom(vmd, page_shortage,=20
> starting_page_shortage);
>=20
> The OOM-killer function vm_pageout_oom() is wrapped with function vm_page=
out_mightbe_oom().
>=20
> To know from which commit this behavior was changed, I search the FreeBSD=
 SVN page and find a clue.
> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D290920
> In SVN commit r290920, a new sysctl node called vm.pageout_oom_seq was ad=
ded to control the sensitivity of OOM-killer.
> The default value of pageout_oom_seq is 12, the commit log said:
> The number of passes to trigger OOM was selected empirically and=20
> tested both on small (32M-64M i386 VM) and large (32G amd64)=20
> configurations.
>=20
> However, in my case, even vm.pageout_oom_seq is 12 by default, it didn't =
work as expected.
So lower the sysctl.  Lower the value, more sensitive OOM is to the lack of=
 the pagedaemon progress.

> I doubt it's a bug, but I'm not pretty sure since I can't fully understan=
d these codes.
> I just want OOM-killer behaving on FreeBSD 11.0 like FreeBSD 9.3 does.
FreeBSD 9 OOM behavior was buggy, it caused serious issues on small machine=
s and on swap-less setups.  New OOM trigger might require some manual tunin=
g for specific combination of workload and machine config.

> Is there anyone know how to solve it?
>=20
> Thanks!
>=20
>=20
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list=20
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org=
"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?DM5PR10MB17546B2183F0C958992D84E4EB690>