From owner-freebsd-hackers@freebsd.org Fri Dec 8 04:39:26 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7693FE99BFB for ; Fri, 8 Dec 2017 04:39:26 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-151.reflexion.net [208.70.210.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 243413E8D for ; Fri, 8 Dec 2017 04:39:25 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 25767 invoked from network); 8 Dec 2017 04:12:38 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 8 Dec 2017 04:12:38 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.3) with SMTP; Thu, 07 Dec 2017 23:12:38 -0500 (EST) Received: (qmail 19261 invoked from network); 8 Dec 2017 04:12:37 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 8 Dec 2017 04:12:37 -0000 Received: from [192.168.1.25] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 56336EC952F; Thu, 7 Dec 2017 20:12:37 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: OOM problem? From: Mark Millard In-Reply-To: <20171208011430.GA16016@mcvoy.com> Date: Thu, 7 Dec 2017 20:12:36 -0800 Cc: freebsd-hackers@freebsd.org Content-Transfer-Encoding: 7bit Message-Id: <80D1ECE3-D983-4DFB-9B28-3F716F73CD47@dsl-only.net> References: <20171208011430.GA16016@mcvoy.com> To: Larry McVoy X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Dec 2017 04:39:26 -0000 [Just a pointer to a potential example report on the lists.] On 2017-Dec-7, at 5:14 PM, Larry McVoy wrote: > . . . > It's sort of an ugly problem in that > when it happens your only recourse is to power cycle the machine, you > can't kill off the processes causing the problem. If there is a serial console, can something like, say, CR TILDE CTRL-B get to the db> prompt? (options ALT_BREAK_TO_DEBUGGER example.) > . . . > > Here is the problem. All of these "misbehaved" (by using lots of ram) > processes go to sleep, I believe in vm_wait(). They are all waiting > for more ram so the pageout daemon is kicked but to no avail, all the > ram is tied up in the processes that want more ram. The pageout daemon > kicks out what it can but it quickly gets to the point that it scans > everything and finds nothing (I know this because I added debugging to > show that's what it is doing). > > The OOM code kicks in and it behaves poorly. It doesn't kill any of > the big processes, those are all sleeping without PCATCH on so they are > skipped. The OOM code starts killing off anything it can find, it was > killing getty, ssh, bash, dhclient. One buglet is that, in my opinion, > it finds stuff to kill that it probably shouldn't. Anything that init > will respawn is fine, anything that would not be respawned should be > run as not killable. Seems like an audit of those processes might be > in order. https://lists.freebsd.org/pipermail/freebsd-hackers/2017-December/051890.html may be an example of the problem on a rpi2 but with a swap partition in use. I was able to get to the db> prompt and included some basic information from there. It was head -r326192 based. (I did eventually reboot the rpi2 so I no longer have that specific context available to examine.) > I know that you'll ask why no swap? Just add swap and the problem > goes away. Does it? I don't think so, that's just kicking the can > down the road. If we add 256GB of swap now we have a 512GB bag to fill, > fill that and I think we're right back to where we started. > > . . . === Mark Millard markmi at dsl-only.net