From owner-freebsd-hackers@freebsd.org Wed Feb 24 12:02:25 2021 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id EF0FA55A9B8 for ; Wed, 24 Feb 2021 12:02:25 +0000 (UTC) (envelope-from olivier.freebsd@free.fr) Received: from smtp6-g21.free.fr (smtp6-g21.free.fr [IPv6:2a01:e0c:1:1599::15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Dlvg561Jfz4tQr; Wed, 24 Feb 2021 12:02:25 +0000 (UTC) (envelope-from olivier.freebsd@free.fr) Received: from ravel.localnet (unknown [90.118.181.206]) (Authenticated sender: olivier.freebsd@free.fr) by smtp6-g21.free.fr (Postfix) with ESMTPSA id 610BC78035B; Wed, 24 Feb 2021 13:02:14 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=free.fr; s=smtp-20201208; t=1614168142; bh=/6D6zmMQDU0iWKWYt1BoHN3XYPIV3RLYvs8uPbK2TpI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fhEm8Ee+GMXlUPrM+XnSItCyek6mOynkD89vwS1e0s674TrOtzXeyznzI7A4kakaT B3hJIaVrj7a3pSPxcQ4vIA6YLONiGHs0PYs6+QBZxaySbky3bSOt0AYM4dOGTxv8w1 e0XtPnfccrYffYv4q+dXJUkS7ZcajZDlsgFMo8x+K9uXTuXclw+0FCkwLbgLJTNSg2 qWo4HKXi6Su4fyhG3KoPqR4wKmr/nZ14aYeAmDIlkNlDUFy1AjDFMD5bIlrc4Vxxek 2B0lqmj2yoLG2cCznT9RTbBVfdb/Fj9HhsdxSazHCQ99AI0vsuESFYgZWrtX5iuHXt xzoTCGHGb/L9Q== From: Olivier Certner To: Alan Somers , Konstantin Belousov Cc: FreeBSD Hackers Subject: Re: The out-of-swap killer makes poor choices Date: Wed, 24 Feb 2021 13:02:14 +0100 Message-ID: <1984125.0OzZcVfBr4@ravel> In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4Dlvg561Jfz4tQr X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Technical discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Feb 2021 12:02:26 -0000 Hi, > > Ok, I'll abandon this idea. I hope you don't abandon the idea of improving the OOM killer in the long term if you feel that something is wrong. > I explained the reasoning for the current design, even if it actually > evolved this way, instead being written as a whole with the stated goal. > I do not object against adding something that would help to get it more > fit with different goals as well, but the current idea of making the > system survive should be kept. So true. The main goal (system survival) does not prevent (not so) secondary ones. I'm sorry not to have any technical contribution to propose, but instead I have some testimony that may interest you, although old. 2 to 3 years ago, I stumbled against production problems on servers doing heavy computations. Only a few processes (2 generally) were doing them, and most of the time consumed less than 1/4 of the available RAM (2 GiB). Apart from that, no other process was allocating any significant amount of memory. Only some base default daemons (syslogd, cron) and sshd were running. Occasionally, very big jobs would come, and one or more of these processes would start eating up all available memory, until FreeBSD decided that it was time to take action. Sometimes it would decide to kill one of these processes, but more often than not sshd or cron were killed instead, although they were consuming ridiculous amounts of memory. I tried tweaks via vm.pageout_oom_seq (I think I set it to 120, as Mark did) and vm.pfault_oom_attempts, without much change. In the end, I decided to use 'protect', via rc.conf's '*_oomprotect="YES"' facility, to workaround this problem and save me some headaches. At some point, some of these machines had swap configured (separate AWS disks), but later I removed swap entirely. What I report occurred for the latter configuration, but IIRC I observed similar behavior in the former. I have not had this use case since then, so I can't say if this has been fixed (by commits such as r353734/d307bdcc2c473858) or not. -- Olivier Certner