From owner-freebsd-hackers@freebsd.org Tue Sep 12 01:07:33 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5A77EE0C49F for ; Tue, 12 Sep 2017 01:07:33 +0000 (UTC) (envelope-from Zhixin.Wan@watchguard.com) Received: from XCS01CO.watchguard.com (mx1.watchguard.com [206.191.171.101]) by mx1.freebsd.org (Postfix) with ESMTP id 1CB157EF8C for ; Tue, 12 Sep 2017 01:07:32 +0000 (UTC) (envelope-from Zhixin.Wan@watchguard.com) From: Zhixin Wan To: Konstantin Belousov CC: "freebsd-hackers@freebsd.org" Subject: RE: OOM-killer can't work on FreeBSD 11.0 Thread-Topic: OOM-killer can't work on FreeBSD 11.0 Thread-Index: AdMqp76RyQybtFOlSuGM22kMXJHWTAALWxUAACOL8BA= Date: Tue, 12 Sep 2017 01:07:29 +0000 Message-ID: References: <20170911080836.GB6477@kib.kiev.ua> In-Reply-To: <20170911080836.GB6477@kib.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DM5PR10MB1483; 6:x6hRD8xzReCWXJ4lBgTVvGo7tEn4Fk9ph3CvYLYQUhdd+l/pOURGcCfglZZz/czx3rEv0thQtzx5IjPsvI3Ocy/l3cAOuKiYsnlAMVga3eo2NE0CtiNz2YPlLlOn92L/XuvWqMM7dxBvSvre0/CxrbpwIwOFq3+2qbk8QFT4f5QpAxmgKJwl9eRUMokvqZxFzVIVGLR9qXmdt1op4gZ/xFxjlbRocPm/bz0pJRcWi8Vj65gNRHUMgWJ3ywWqmsR1Hxv31DZR2YNbt1YmDE1jU5FFZ2n+VYw7Sbrk843ZAzu5UpWguBicb0HcGjaYLCdid9Ve/wOTXdZ+Qcm2nxCbmA==; 5:IP7Owxm/OMka00X1bY1Z9aCtdezbFKafPRvMwta+7OfpkUfl0n6txNh8VszqjaYW3CHxuBBC+Bhsl8F2vSvrY4moAjf2QvWi17OidA1EdUq5WCzLHvoWgWQTzjE/zP/eWQs25ZUMNqQZbXok+QnCR8KiZGgrY/Mg/M7CMEyfB1Y=; 24:9iMGN3BiIKsAin6REhKIVDzi5CWlwWQWhvup9vd4AmSCt6AaPCRBBGHpSI5M5ZnLgZxK5PGjI/rAal3/uPQGebFqHimNKc2yLpkOEvW9PQY=; 7:E+2sTZFr4B+GTJVuIocKV4CMk1tPmFBB2JHtnLcUlhPSl6srccWRVs7ProLT4HSH0wxxxTkfOPVwn/ypeUrYPEtkD0rd7xIyX8ES1H/AOHxyGb5agY0g+oV4KsI6wEddGlihqqAc5Emi4Cvdg6JZG4WrdO0iT9WE1Q1StVo0MFuXuhVuOiWQJRZpNe3nTkqlXo8gJWAzyYCKBVVSDfamz00T5z6Z0H4/Pf3QA55NSlM= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 1a0f347e-147d-47e8-0b4d-08d4f97aa3b2 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:DM5PR10MB1483; x-ms-traffictypediagnostic: DM5PR10MB1483: x-exchange-antispam-report-test: UriScan:(190756311086443)(75325880899374)(56005881305849); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(93006095)(93001095)(3002001)(10201501046)(100000703101)(100105400095)(6041248)(20161123558100)(20161123564025)(20161123555025)(20161123560025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DM5PR10MB1483; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DM5PR10MB1483; x-forefront-prvs: 042857DBB5 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39830400002)(189002)(24454002)(13464003)(199003)(9686003)(5660300001)(6306002)(8676002)(4326008)(68736007)(305945005)(7736002)(86362001)(3660700001)(53546010)(99286003)(7696004)(55016002)(3280700002)(39060400002)(14454004)(966005)(478600001)(6246003)(53936002)(81166006)(81156014)(189998001)(6506006)(102836003)(50986999)(6436002)(74316002)(6116002)(3846002)(101416001)(66066001)(2900100001)(1411001)(33656002)(54356999)(76176999)(97736004)(110136004)(77096006)(105586002)(8936002)(72206003)(6916009)(2950100002)(2906002)(106356001)(229853002)(25786009)(316002); DIR:OUT; SFP:1101; SCL:1; SRVR:DM5PR10MB1483; H:DM5PR10MB1754.namprd10.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Sep 2017 01:07:29.1732 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 2563c132-88f5-466f-bbb2-e83153b3c808 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR10MB1483 X-OriginatorOrg: watchguard.com Received-SPF: none X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Sep 2017 01:07:33 -0000 Thanks!=20 I will try to tune this sysctl, let's see what to happen. -----Original Message----- From: Konstantin Belousov [mailto:kostikbel@gmail.com]=20 Sent: Monday, September 11, 2017 16:09 To: Zhixin Wan Cc: freebsd-hackers@freebsd.org Subject: Re: OOM-killer can't work on FreeBSD 11.0 On Mon, Sep 11, 2017 at 03:56:45AM +0000, Zhixin Wan via freebsd-hackers wr= ote: > Hi, >=20 > I have a mail system running FreeBSD 9.3 which is put on VMWare ESXi, it'= s assigned a low memory (1G or 2G) and a reasonable swap disk size (2 x Mem= ory size). > The mail system was running for several years, and didn't see any freeze = even a lot of mail traffic through it. >=20 > Recently I upgraded this mail system from FreeBSD 9.3 to FreeBSD 11.0,=20 > and after running a few days, the mail system got freeze. I can't get any= response from the console, and can't login to the mail system with SSH eit= her, except ping to the system got response. I look into the message log an= d found a lot of messages: >=20 > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(5): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager: out of swap space > swap_pager_getswapspace(1): failed > swap_pager_getswapspace(16): failed > swap_pager_getswapspace(12): failed > swap_pager_getswapspace(9): failed > swap_pager_getswapspace(16): failed > ... >=20 > It seems that the out of swap cause the system freeze. >=20 > To figure out this problem, restore the mail system to previous backup sn= apshot which is running on FreeBSD 9.3. > Put mail traffic pressure on the mail system, and observe the memory and = swap space usage with a simple shell: >=20 > #!/bin/sh > while [ 1 ]; do > vmstat > pstat -s > sleep 60 > done >=20 > >From the console, I saw the memory and swap space usage increased=20 > >quickly. Once the swap space was eat out, > out of swap messages will be shown in message log: >=20 > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(6): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(5): failed > swap_pager_getswapspace(8): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(4): failed > Sep 6 08:30:58 mail-system kernel: pid 92324 (bm_scanner), uid 5500,=20 > was killed: out of swap space >=20 > Compared to FreeBSD 11.0, there are still a lot of "swap_pager_getswapspa= ce failed" messages, except FreeBSD 9.3 will kill a process to free memory. > This behavior cause the mail system can go on running, but FreeBSD=20 > 11.0 can't. Observe the system memory and swap space usage continuously, = the OOM-killer works accurately: once the swap space usage is 100%, the OOM= -killer will be called to kill a process to free memory. No, this is not the right behaviour. Filling up the swap space must not cau= se the OOM to trigger (in the default setup of swap overcommit turned off). >=20 > Dig into the source code of FreeBSD 9.3, file vm_pageout.c, function vm_p= ageout_scan(): > /* > * If we are critically low on one of RAM or swap and low = on > * the other, kill the largest process. However, we avoid > * doing this on the first pass in order to give ourselves= a > * chance to flush out dirty vnode-backed pages and to all= ow > * active pages to be moved to the inactive queue and recl= aimed. > */ > if (pass !=3D 0 && > ((swap_pager_avail < 64 && vm_page_count_min()) || > (swap_pager_full && vm_paging_target() > 0))) > vm_pageout_oom(VM_OOM_MEM); >=20 > the corresponding source code in FreeBSD 11.0, file vm_pageout.c, functio= n vm_pageout_scan(): > /* > * If the inactive queue scan fails repeatedly to meet its > * target, kill the largest process. > */ > vm_pageout_mightbe_oom(vmd, page_shortage,=20 > starting_page_shortage); >=20 > The OOM-killer function vm_pageout_oom() is wrapped with function vm_page= out_mightbe_oom(). >=20 > To know from which commit this behavior was changed, I search the FreeBSD= SVN page and find a clue. > https://svnweb.freebsd.org/base?view=3Drevision&revision=3D290920 > In SVN commit r290920, a new sysctl node called vm.pageout_oom_seq was ad= ded to control the sensitivity of OOM-killer. > The default value of pageout_oom_seq is 12, the commit log said: > The number of passes to trigger OOM was selected empirically and=20 > tested both on small (32M-64M i386 VM) and large (32G amd64)=20 > configurations. >=20 > However, in my case, even vm.pageout_oom_seq is 12 by default, it didn't = work as expected. So lower the sysctl. Lower the value, more sensitive OOM is to the lack of= the pagedaemon progress. > I doubt it's a bug, but I'm not pretty sure since I can't fully understan= d these codes. > I just want OOM-killer behaving on FreeBSD 11.0 like FreeBSD 9.3 does. FreeBSD 9 OOM behavior was buggy, it caused serious issues on small machine= s and on swap-less setups. New OOM trigger might require some manual tunin= g for specific combination of workload and machine config. > Is there anyone know how to solve it? >=20 > Thanks! >=20 >=20 > _______________________________________________ > freebsd-hackers@freebsd.org mailing list=20 > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= "