From owner-freebsd-hackers@freebsd.org Tue Feb 23 20:50:02 2021 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 88A4F552349 for ; Tue, 23 Feb 2021 20:50:02 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-ot1-f52.google.com (mail-ot1-f52.google.com [209.85.210.52]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DlWQK697sz4bgk for ; Tue, 23 Feb 2021 20:50:01 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-ot1-f52.google.com with SMTP id d9so2884417ote.12 for ; Tue, 23 Feb 2021 12:50:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=orgxXiXu6I6qcL8EAko/dQdM9hGyBL4wSxHcyHr74R0=; b=kKdY1Eh6/TDzLTyf34b0Z/9b+9feabP1WENFyRWlhrxs0oMUSFltGsIrewAjzLi0EY dEcWw0LRXa8vzZgX76sLofYSEybgnzUM0I2ufTHcf0QWOCs6frajA5xkyUwHtELpEvVJ YQMP9TopSMRcbTDHmHbviQU0w9SMh0lNNh24UgncCsKLktFbgqZMcOIl+ORcWk6CVIVL 4OJ5BCOf+Lzg/VO7/bP1MgBxM2sbGXtEQKGT6vJk3/v73DAkiE7SN/2hY1/fk0F4Fe+X Sj3ErPo+2x79B+dkLZAm70LK+nQPbmI1lwE1oXYItQtmgCsPgz3ea++VDXayZYPu1Ipj c4XQ== X-Gm-Message-State: AOAM531QpySw7zlPRIQNuriXNhQKEwk+dHZRu+bM7UuC8hSCZbrFM8FQ fiJX6PgbeiPrjc9N977FgaYNzklgn+c0cTk+HMWX51tchsxRvQ== X-Google-Smtp-Source: ABdhPJz/njLWpGwTbsjOiQkvvNnsPXh1H93rxgME2Hnw3pswJwJJouL11tTQvYEknFWWRv9KlMm1jC0MgKNfo5xaA7Y= X-Received: by 2002:a9d:3642:: with SMTP id w60mr21541398otb.18.1614113400244; Tue, 23 Feb 2021 12:50:00 -0800 (PST) MIME-Version: 1.0 From: Alan Somers Date: Tue, 23 Feb 2021 13:49:49 -0700 Message-ID: Subject: The out-of-swap killer makes poor choices To: FreeBSD Hackers X-Rspamd-Queue-Id: 4DlWQK697sz4bgk X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.210.52 as permitted sender) smtp.mailfrom=asomers@gmail.com X-Spamd-Result: default: False [-1.00 / 15.00]; RWL_MAILSPIKE_GOOD(0.00)[209.85.210.52:from]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; TO_DN_ALL(0.00)[]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RBL_DBL_DONT_QUERY_IPS(0.00)[209.85.210.52:from]; R_DKIM_NA(0.00)[]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; TO_DOM_EQ_FROM_DOM(0.00)[]; FREEFALL_USER(0.00)[asomers]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; SPAMHAUS_ZRD(0.00)[209.85.210.52:from:127.0.2.255]; NEURAL_SPAM_SHORT(1.00)[1.000]; RCVD_IN_DNSWL_NONE(0.00)[209.85.210.52:from]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-hackers] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Technical discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Feb 2021 20:50:02 -0000 To me it's always seemed like the out-of-swap killer kills the wrong process. Oh, it does the right thing with a trivial while(1) {malloc()} test program, but not with real workloads. To summarize the logic in vm_pageout_oom: * Don't kill system, protected, or killed processes * Don't kill processes with a thread that isn't running or suspended * Kill whichever process is using the most swap or swap + ram, depending on the shortage variable. On ties, kill the newest one. This algorithm probably made sense in the days when computers had much more swap than RAM. But now it leads to several problems: * It's almost guaranteed to do the wrong thing when shortage == VM_OOM_SWAPZ and there is little or no swap configured. If no swap is configured, it will kill the newest running or suspended process. If a little bit is configured, it will probably kill some idle process, like zfsd, that is swapped out because it doesn't run very often. * Even if multiple GB of swap are configured, the OOM killer is still biased towards killing idle processes when shortage == VM_OOM_SWAPZ. Most often, the process responsible for an out-of-memory condition is not idle, and is consuming large amounts of RAM. * It ignores RLIMIT_RSS. We consider that rlimit when deciding whether to move a process from RAM to swap. * The "out of swap space" kernel message doesn't specify whether the process was killed because of insufficient swap or RAM (the shortage variable) I propose the following changes: * Incorporate shortage into the "out of swap space" message. * When walking the process list, if any process exceeds its RLIMIT_RSS, choose it immediately, without bothering to compare it to older processes. * Always consider the sum of a process's RAM + swap, regardless of the shortage variable. Does this make sense? Am I missing something about shortage == VM_OOM_SWAPZ? I don't understand why you would ever want to exclude processes' RAM usage. That logic was added in revision 2025d69ba7a68a5af173007a8072c45ad797ea23, but I don't understand the rationale. -Alan