From owner-freebsd-hackers@freebsd.org  Wed Feb 24 17:34:35 2021
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id E2C1C563826
 for <freebsd-hackers@mailman.nyi.freebsd.org>;
 Wed, 24 Feb 2021 17:34:35 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: from mail-oo1-f50.google.com (mail-oo1-f50.google.com
 [209.85.161.50])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4Dm32M60rRz3qs1
 for <freebsd-hackers@freebsd.org>; Wed, 24 Feb 2021 17:34:35 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: by mail-oo1-f50.google.com with SMTP id s10so704635oom.6
 for <freebsd-hackers@freebsd.org>; Wed, 24 Feb 2021 09:34:35 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=zk8g2TH2JwI+kgI/xqAAq2GqOY/sWYNuxO493ye6AVo=;
 b=YbhbZk7OW02YAJhcMkkKTorG/PmgcJLrfWPJR/zsgPuiVgcAv47ttqxmpBORgVFgMS
 vV9ucJWdS7K7ZNQ0ze1VPAY3swqapvCg/5vYc0abppYRsK8l0lVLUKhnPTsJXr0haWhp
 QozyEaVjdX7NE9LolaOrwrqDs872Xf5cON9yDudhH0NSx0F/BY/mDKJZjPV+BCCUV8By
 4pO33XL61R14caDHHf2rgsVT7x3M8B8XNu6d+dDuiGzFHJ6T7wYZqaLtsSfNJIkBlTVr
 +weGeSSeYerYPOsFRVMxoaY/xTVVEr+csxLnfIOvF2fR8cgpP/K/WN4x+1DkOVk6jiR+
 xKyA==
X-Gm-Message-State: AOAM530rPB7GrLmssYZV55BOdCWbV6UW7V6f3NHh0Zr3wAxuy5ORgpLY
 csK9lt5yrTWynM2cPDDhi26jU4MxrM2u1H2dLMw=
X-Google-Smtp-Source: ABdhPJwZ23Fy6smd3YzSUW5eVl31JAuhmDAgL6gr/jeJbbu3uJzuG+nZuewIYRTSzS/VhKVJiEcEJ3Pijn8yZBiRUWw=
X-Received: by 2002:a4a:970b:: with SMTP id u11mr24835975ooi.79.1614188074723; 
 Wed, 24 Feb 2021 09:34:34 -0800 (PST)
MIME-Version: 1.0
References: <CAOtMX2jYmrK7ftx62_NEfNCWS7O=giHKL1p9kXCqq1t5E1arxA@mail.gmail.com>
 <CAOtMX2i3Njo=KBP=99_G0+KuSa00CVgNvacmzhTaoZUYEhwPPA@mail.gmail.com>
 <YDYyQ1V/hEAGV+yJ@kib.kiev.ua> <1984125.0OzZcVfBr4@ravel>
In-Reply-To: <1984125.0OzZcVfBr4@ravel>
From: Alan Somers <asomers@freebsd.org>
Date: Wed, 24 Feb 2021 10:34:23 -0700
Message-ID: <CAOtMX2iYr4NDYE0xHSa_w1hA5XQ2m9cA28NzPoGbfzAKKox9aQ@mail.gmail.com>
Subject: Re: The out-of-swap killer makes poor choices
To: Olivier Certner <olivier.freebsd@free.fr>
Cc: Konstantin Belousov <kostikbel@gmail.com>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-Rspamd-Queue-Id: 4Dm32M60rRz3qs1
X-Spamd-Bar: ----
Authentication-Results: mx1.freebsd.org;
	none
X-Spamd-Result: default: False [-4.00 / 15.00];
	 REPLY(-4.00)[]
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.34
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Technical discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Feb 2021 17:34:35 -0000

On Wed, Feb 24, 2021 at 5:02 AM Olivier Certner <olivier.freebsd@free.fr>
wrote:

> Hi,
>
> > > Ok, I'll abandon this idea.
>
> I hope you don't abandon the idea of improving the OOM killer in the long
> term
> if you feel that something is wrong.
>
> > I explained the reasoning for the current design, even if it actually
> > evolved this way, instead being written as a whole with the stated goal.
> > I do not object against adding something that would help to get it more
> > fit with different goals as well, but the current idea of making the
> > system survive should be kept.
>
> So true. The main goal (system survival) does not prevent (not so)
> secondary
> ones.
>
> I'm sorry not to have any technical contribution to propose, but instead I
> have some testimony that may interest you, although old.
>
> 2 to 3 years ago, I stumbled against production problems on servers doing
> heavy computations. Only a few processes (2 generally) were doing them,
> and
> most of the time consumed less than 1/4 of the available RAM (2 GiB).
> Apart
> from that, no other process was allocating any significant amount of
> memory.
> Only some base default daemons (syslogd, cron) and sshd were running.
> Occasionally, very big jobs would come, and one or more of these processes
> would start eating up all available memory, until FreeBSD decided that it
> was
> time to take action.
>
> Sometimes it would decide to kill one of these processes, but more often
> than
> not sshd or cron were killed instead, although they were consuming
> ridiculous
> amounts of memory. I tried tweaks via vm.pageout_oom_seq (I think I set it
> to
> 120, as Mark did) and vm.pfault_oom_attempts, without much change.
>
> In the end, I decided to use 'protect', via rc.conf's '*_oomprotect="YES"'
> facility, to workaround this problem and save me some headaches.
>
> At some point, some of these machines had swap configured (separate AWS
> disks), but later I removed swap entirely. What I report occurred for the
> latter configuration, but IIRC I observed similar behavior in the former.
>
> I have not had this use case since then, so I can't say if this has been
> fixed
> (by commits such as r353734/d307bdcc2c473858) or not.
>
> --
> Olivier Certner
>

There's another silly problem that I didn't mention in my original post.
The old rule of thumb is that the swap partition's size should be twice as
large as the amount of RAM.  However, that's no longer possible in many
cases.  The kernel imposes a hard limit of 64 GiB (on amd64 at least) on
the usable size of any swap partition, and many servers now have far more
than 64 GiB of RAM.  So the advice needs to change with the times.  I don't
know what the best size would be for a modern server, but I would guess
that it must be at least several times the RSS of your largest process, and
also at least one tenth of RAM (for use as a dump device with compressed
core dumps).
-Alan