From owner-svn-src-all@freebsd.org Sun Apr 1 16:27:08 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D553FF6B161; Sun, 1 Apr 2018 16:27:07 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-it0-x22f.google.com (mail-it0-x22f.google.com [IPv6:2607:f8b0:4001:c0b::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6932273154; Sun, 1 Apr 2018 16:27:07 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-it0-x22f.google.com with SMTP id z7-v6so6056035iti.1; Sun, 01 Apr 2018 09:27:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Vf8ECtNNzS7niZZwchCI2WHgQiiVRqlPExyjlatX878=; b=SAFHR4PNGWQSwYEeDmD+HG6RyMv0aqTCglkPe7WQD2AQp+xWDe2IG1GF4cAhxjmFM8 DRJJzfqQmQv0/4IFVFizIh81VVKyUq9wwc3v5SBmhFZGkFuvyi7zvsJDzGRnugEY0HmF ioVWSu4JclDN+6DgDwOEA6Ps6nEtNoNAIGQj9WV/g/qmpnFUo8+YKHqjMKNetjmTEu7L jukMy4EbeQiQSchDCoij9FtGaf3NfbdSeqYsZ66GZ3rzO3MpAc2GXIkSljt+fnK74sEV sDCTBarzvNDn/cUwBR3BqhuHZQDkRknh05dWpX7sUEYmSDAutmD8w6FV+8759nk+++Uo rtsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=Vf8ECtNNzS7niZZwchCI2WHgQiiVRqlPExyjlatX878=; b=pbpEQXN7FV2XgWGxzUfD9oqFgN+fqEI6cGftSxooGQacU0a8x9jrZ5+zomnkCFkpPw +8nIvkZbDk0LI4p5F4oVZD3V9+PEdul5ZmLA2g0lfcOsWBj3jOH1mj9wnMQK9YCgBJ13 OVru4MHpKHL+a9fIq9DZC1t+GWCDpdIWWmbgPT4yJUDb5tuY0icJ1oNkWpsp6vo3P5qt hnsRykpk4smZA8btb9ertFv/b6VRQ/Vn/431wADg8SbbzhnbTC8lf1yqP9nKg4PHpCrV 1ETx+8Fl0YHk+KJmRrGghiW2svhU6yXvjfAAdddzjjAG4blt6cRGbSVAYrqR4WzYwu+W UrNQ== X-Gm-Message-State: AElRT7HOQNUsdN+o4j3yKzaZ3kcoDd73454MibyDToDpripu7ZzAv2ue 7fg3lUs/SaaRSm2gFf9Su3B5tQ== X-Google-Smtp-Source: AIpwx4+RWKXISgA6Cn6b6jjsG4ahtUWWjrRGxQvUb7e1KblfXrh5y4FDGpaoJmm9MBywQKSmtKCx8g== X-Received: by 2002:a24:7f46:: with SMTP id r67-v6mr9489409itc.127.1522600026347; Sun, 01 Apr 2018 09:27:06 -0700 (PDT) Received: from raichu (toroon0560w-lp130-01-174-88-76-83.dsl.bell.ca. [174.88.76.83]) by smtp.gmail.com with ESMTPSA id l7sm7053993ioe.50.2018.04.01.09.27.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 01 Apr 2018 09:27:05 -0700 (PDT) Sender: Mark Johnston Date: Sun, 1 Apr 2018 12:27:03 -0400 From: Mark Johnston To: Tijl Coosemans Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r331732 - head/sys/vm Message-ID: <20180401162703.GD1440@raichu> References: <201803291427.w2TEReA3024929@repo.freebsd.org> <20180331202118.5401ed2a@kalimero.tijl.coosemans.org> <20180331225432.GB1440@raichu> <20180401172021.27852803@kalimero.tijl.coosemans.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180401172021.27852803@kalimero.tijl.coosemans.org> User-Agent: Mutt/1.9.4 (2018-02-28) X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Apr 2018 16:27:08 -0000 On Sun, Apr 01, 2018 at 05:20:21PM +0200, Tijl Coosemans wrote: > On Sat, 31 Mar 2018 18:54:32 -0400 Mark Johnston wrote: > > On Sat, Mar 31, 2018 at 08:21:18PM +0200, Tijl Coosemans wrote: > > > On Thu, 29 Mar 2018 14:27:40 +0000 (UTC) Mark Johnston wrote: > > > > Author: markj > > > > Date: Thu Mar 29 14:27:40 2018 > > > > New Revision: 331732 > > > > URL: https://svnweb.freebsd.org/changeset/base/331732 > > > > > > > > Log: > > > > Fix the background laundering mechanism after r329882. > > > > > > > > Rather than using the number of inactive queue scans as a metric for > > > > how many clean pages are being freed by the page daemon, have the > > > > page daemon keep a running counter of the number of pages it has freed, > > > > and have the laundry thread use that when computing the background > > > > laundering threshold. > > > > [...] > > > > > > I'm seeing big processes being killed with an "out of swap space" message > > > even though there's still plenty of swap available. It seems to be fixed > > > by making this division round upwards: > > > > > > if (target == 0 && ndirty * isqrt((nfreed + > > > (vmd->vmd_free_target - vmd->vmd_free_min) - 1) / > > > (vmd->vmd_free_target - vmd->vmd_free_min)) >= nclean) { > > > > > > I don't know where this formula comes from, so I don't know if this > > > change is correct. > > > > Hm, that's somewhat surprising. This code shouldn't be executing in > > situations where the OOM kill logic is invoked (i.e., memory pressure > > plus a shortage of clean pages in the inactive queue). > > > > How much RAM does the system have? Could you collect "sysctl vm" output > > around the time of an OOM kill? > > 1GiB RAM. I've sampled sysctl vm every 5s from the moment the process > starts swapping until it is killed and uploaded that to > https://people.freebsd.org/~tijl/sysctl/ Thank you. Now I agree with your change. Would you like to commit it? I can take care of it if you prefer. There is still a deeper problem here after r329882 in that the shortfall laundering mechanism is not kicking in before the OOM killer. The problem is that the PID controller may produce a positive output even when the error is negative; this is seen in 22.txt, the last capture before the OOM kill. Because the error is negative (i.e., v_free_count > v_free_target), we will not attempt to launder pages in shortfall mode. The positive output means that the page daemon is repeatedly scanning the (completely depleted) inactive queue, but since the laundering mechanism is failing to produce clean pages, there is nothing to reclaim and so we eventually invoke the OOM killer. I'm not yet sure how best to address this, but your change is probably sufficient to mitigate the problem in general and also corrects an unintentional behaviour change in my commit.