From owner-freebsd-hackers@freebsd.org Mon Sep 11 04:04:53 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2EFA9E17E85 for ; Mon, 11 Sep 2017 04:04:53 +0000 (UTC) (envelope-from Zhixin.Wan@watchguard.com) Received: from XCS01CO.watchguard.com (mx1.watchguard.com [206.191.171.101]) by mx1.freebsd.org (Postfix) with ESMTP id EA5AE6E71D for ; Mon, 11 Sep 2017 04:04:52 +0000 (UTC) (envelope-from Zhixin.Wan@watchguard.com) From: Zhixin Wan To: "freebsd-hackers@freebsd.org" Subject: OOM-killer can't work on FreeBSD 11.0 Thread-Topic: OOM-killer can't work on FreeBSD 11.0 Thread-Index: AdMqp76RyQybtFOlSuGM22kMXJHWTA== Date: Mon, 11 Sep 2017 03:56:45 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DM5SPR00MB253; 6:TbSfV3acElMMDzYFXUbNiCvKE+yv8o/7pkq9eZ4/WXEXs1DS5x7pzUhsGQB0Qm3AlRryPF1B/2jJcfthFhDnD2enAQvVaehruEOkDnctDIAKMC3DHK57Ci2XZ5wqYDmlDwSOQ72ZvbulqTT3d2iHS8N+f+FUDXrIFe0YVMk9v7Bv0gGhrbZ3oPF3AakIdBxcxvqDBMfWKbcYrESw45fRsh/iEyGbhxgbFJQSvcNpjaqSHZNj+GUepd31rETlmy0r3xKuyUPV+tKOLwyei6WvUqb6isTkkGAJfOSN/pWxtYbg3JHeHrIEzVFqchV/0Bv0mnUEXF4I13JMbOHEfBAMHA==; 5:lRNub5FY6jVZXAJMBBdt57IgrZF/mBQ/QJrSn6gFbpDE+E0SOmFIIYWMxVXJt+mjOFF+XdB8AlRQH9B7th1K3hhUwuRVQFrNmdG2bkaaPoYVTPzZKJc9BOn2UtBB3zPNVlqbxwezDhN6EtsXvkENug==; 24:hujWcj8c54U3+Vjky1hhzXyVfn5Bi8QvPToSjrfiK8UqWgKfv+8f/4tL/utvQxXgdsAZwopdltSqO+G09it7WV9/yAgwJhWz5KNDugl87+o=; 7:UXgItVsdMngvf2RMqcy0S3rtdT8IeeJaoZcU66/3FIjD2pXEr68wqB3U33tA4w5CPW6TOWqXQPgMzbnRknx9LRqCunHTChawmzqov2qqQXH7EajlKekjRPl/dSMjE03gAUYBUOp5j7d4RR0PUgBHDu21PS9BD0AWzIq+DavLTnA0bxmoGm0uU+k4zXl33TOpot8qR+mYKlpfPYmnhs2d/pdqDXdDrf/FjU0lA2yD5qg= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 4ef69695-f1ac-49db-752c-08d4f8c91f0d x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:DM5SPR00MB253; x-ms-traffictypediagnostic: DM5SPR00MB253: x-exchange-antispam-report-test: UriScan:(190756311086443)(21748063052155)(56005881305849); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(93006095)(93001095)(3002001)(10201501046)(100000703101)(100105400095)(6041248)(20161123560025)(20161123555025)(20161123558100)(20161123564025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DM5SPR00MB253; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DM5SPR00MB253; x-forefront-prvs: 04270EF89C x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39830400002)(189002)(199003)(790700001)(7696004)(5660300001)(2501003)(3660700001)(81156014)(66066001)(81166006)(8676002)(966005)(14454004)(3280700002)(7736002)(77096006)(8936002)(2906002)(54356999)(5640700003)(606006)(102836003)(50986999)(86362001)(99286003)(54896002)(236005)(6916009)(33656002)(6306002)(53936002)(68736007)(97736004)(5630700001)(110136004)(101416001)(72206003)(3846002)(55016002)(9686003)(478600001)(6116002)(6506006)(106356001)(105586002)(6436002)(2351001)(2900100001)(74316002)(25786009)(9326002)(189998001); DIR:OUT; SFP:1101; SCL:1; SRVR:DM5SPR00MB253; H:DM5PR10MB1754.namprd10.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Sep 2017 03:56:45.7298 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 2563c132-88f5-466f-bbb2-e83153b3c808 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5SPR00MB253 X-OriginatorOrg: watchguard.com Received-SPF: none Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Sep 2017 04:04:53 -0000 Hi, I have a mail system running FreeBSD 9.3 which is put on VMWare ESXi, it's = assigned a low memory (1G or 2G) and a reasonable swap disk size (2 x Memor= y size). The mail system was running for several years, and didn't see any freeze ev= en a lot of mail traffic through it. Recently I upgraded this mail system from FreeBSD 9.3 to FreeBSD 11.0, and = after running a few days, the mail system got freeze. I can't get any respo= nse from the console, and can't login to the mail system with SSH either, except ping to the syst= em got response. I look into the message log and found a lot of messages: swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(4): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(4): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(5): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(4): failed swap_pager: out of swap space swap_pager_getswapspace(1): failed swap_pager_getswapspace(16): failed swap_pager_getswapspace(12): failed swap_pager_getswapspace(9): failed swap_pager_getswapspace(16): failed ... It seems that the out of swap cause the system freeze. To figure out this problem, restore the mail system to previous backup snap= shot which is running on FreeBSD 9.3. Put mail traffic pressure on the mail system, and observe the memory and sw= ap space usage with a simple shell: #!/bin/sh while [ 1 ]; do vmstat pstat -s sleep 60 done >From the console, I saw the memory and swap space usage increased quickly. = Once the swap space was eat out, out of swap messages will be shown in message log: swap_pager_getswapspace(4): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(4): failed swap_pager_getswapspace(6): failed swap_pager_getswapspace(2): failed swap_pager_getswapspace(2): failed swap_pager_getswapspace(2): failed swap_pager_getswapspace(5): failed swap_pager_getswapspace(8): failed swap_pager_getswapspace(2): failed swap_pager_getswapspace(4): failed Sep 6 08:30:58 mail-system kernel: pid 92324 (bm_scanner), uid 5500, was ki= lled: out of swap space Compared to FreeBSD 11.0, there are still a lot of "swap_pager_getswapspace= failed" messages, except FreeBSD 9.3 will kill a process to free memory. This behavior cause the mail system can go on running, but FreeBSD 11.0 can= 't. Observe the system memory and swap space usage continuously, the OOM-killer works accurately: once the swap space usage is 100%, the OOM= -killer will be called to kill a process to free memory. Dig into the source code of FreeBSD 9.3, file vm_pageout.c, function vm_pag= eout_scan(): /* * If we are critically low on one of RAM or swap and low on * the other, kill the largest process. However, we avoid * doing this on the first pass in order to give ourselves a * chance to flush out dirty vnode-backed pages and to allow * active pages to be moved to the inactive queue and reclai= med. */ if (pass !=3D 0 && ((swap_pager_avail < 64 && vm_page_count_min()) || (swap_pager_full && vm_paging_target() > 0))) vm_pageout_oom(VM_OOM_MEM); the corresponding source code in FreeBSD 11.0, file vm_pageout.c, function = vm_pageout_scan(): /* * If the inactive queue scan fails repeatedly to meet its * target, kill the largest process. */ vm_pageout_mightbe_oom(vmd, page_shortage, starting_page_shortage); The OOM-killer function vm_pageout_oom() is wrapped with function vm_pageou= t_mightbe_oom(). To know from which commit this behavior was changed, I search the FreeBSD S= VN page and find a clue. https://svnweb.freebsd.org/base?view=3Drevision&revision=3D290920 In SVN commit r290920, a new sysctl node called vm.pageout_oom_seq was adde= d to control the sensitivity of OOM-killer. The default value of pageout_oom_seq is 12, the commit log said: The number of passes to trigger OOM was selected empirically and tested both on small (32M-64M i386 VM) and large (32G amd64) configurations. However, in my case, even vm.pageout_oom_seq is 12 by default, it didn't wo= rk as expected. I doubt it's a bug, but I'm not pretty sure since I can't fully understand = these codes. I just want OOM-killer behaving on FreeBSD 11.0 like FreeBSD 9.3 does. Is there anyone know how to solve it? Thanks!