From owner-freebsd-current@freebsd.org Tue Jan 28 20:11:48 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 5C7C123B22B; Tue, 28 Jan 2020 20:11:48 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (www.zefox.net [50.1.20.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "www.zefox.org", Issuer "www.zefox.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 486d763cc6z4Tnd; Tue, 28 Jan 2020 20:11:46 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (localhost [127.0.0.1]) by www.zefox.net (8.15.2/8.15.2) with ESMTPS id 00SKBrDa015170 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 28 Jan 2020 12:11:54 -0800 (PST) (envelope-from fbsd@www.zefox.net) Received: (from fbsd@localhost) by www.zefox.net (8.15.2/8.15.2/Submit) id 00SKBrxN015169; Tue, 28 Jan 2020 12:11:53 -0800 (PST) (envelope-from fbsd) Date: Tue, 28 Jan 2020 12:11:53 -0800 From: bob prohaska To: Mark Millard Cc: Konstantin Belousov , freebsd-arm , FreeBSD Current , bob prohaska Subject: Re: OOMA kill with vm.pfault_oom_attempts="-1" on RPi3 at r357147 (a vm_pfault_oom_attempts < 0 handling bug as of head -r357026) Message-ID: <20200128201152.GA15110@www.zefox.net> References: <20200127190709.GA11328@www.zefox.net> <20200128035317.GA12644@www.zefox.net> <18150258-6210-451E-A5B9-528129A05974@yahoo.com> <9BF68EF1-F83A-473B-9A7B-B3956D6A5EFD@yahoo.com> <20200128170518.GA14654@www.zefox.net> <5A3CE2DA-C5B8-4CC1-BEEA-8B9649A20B8B@yahoo.com> <20200128190210.GA14784@www.zefox.net> <94E68249-7751-4B27-AE95-E9C2776D730B@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <94E68249-7751-4B27-AE95-E9C2776D730B@yahoo.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Rspamd-Queue-Id: 486d763cc6z4Tnd X-Spamd-Bar: +++ Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of fbsd@www.zefox.net has no SPF policy when checking 50.1.20.27) smtp.mailfrom=fbsd@www.zefox.net X-Spamd-Result: default: False [3.69 / 15.00]; ARC_NA(0.00)[]; WWW_DOT_DOMAIN(0.50)[]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; IP_SCORE(0.06)[ip: (0.27), ipnet: 50.1.16.0/20(0.13), asn: 7065(-0.04), country: US(-0.05)]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[zefox.net]; AUTH_NA(1.00)[]; NEURAL_SPAM_MEDIUM(0.75)[0.751,0]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; NEURAL_SPAM_LONG(0.98)[0.978,0]; R_SPF_NA(0.00)[]; FREEMAIL_TO(0.00)[yahoo.com]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:7065, ipnet:50.1.16.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; MID_RHS_WWW(0.50)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jan 2020 20:11:48 -0000 On Tue, Jan 28, 2020 at 11:28:14AM -0800, Mark Millard wrote: > > > On 2020-Jan-28, at 11:02, bob prohaska wrote: > > > On Tue, Jan 28, 2020 at 09:42:17AM -0800, Mark Millard wrote: > >> > >> > >> > > The (partly)modified kernel compiled and booted without > > obvious trouble. It's trying to finish buildworld now. > > Stopped already, with Jan 28 11:41:59 www kernel: pid 29909 (cc), jid 0, uid 0, was killed: fault's page allocation failed > >> If you are testing with vm.pfault_oom_attempts="-1" then > >> the vm_fault printf message should never happen anyway. > >> > > Would it not be interesting if the message appeared in that > > case? > > Thanks for the question: looking at the new code found a bug > causing oom where it used to be avoided in head -r357025 and > before. Glad to be of service, even if inadvertently 8-) > After vm_waitpfault(dset, vm_pfault_oom_wait * hz) > the -r357026 code does a vm_pageout_oom(VM_OOM_MEM_PF) no > matter what, even when vm_pfault_oom_attempts < 0 || > fs->oom < vm_pfault_oom_attempts : > > New code in head -r357026 > ( nothing to avoid the vm_pageout_oom(VM_OOM_MEM_PF) > for vm_pfault_oom_attempts < 0 || > fs->oom < vm_pfault_oom_attempts ): > > if (fs->m == NULL) { > unlock_and_deallocate(fs); > if (vm_pfault_oom_attempts < 0 || > fs->oom < vm_pfault_oom_attempts) { > fs->oom++; > vm_waitpfault(dset, vm_pfault_oom_wait * hz); > } > if (bootverbose) > printf( > "proc %d (%s) failed to alloc page on fault, starting OOM\n", > curproc->p_pid, curproc->p_comm); > vm_pageout_oom(VM_OOM_MEM_PF); > return (KERN_RESOURCE_SHORTAGE); > } > > Old code in head -r357025 > ( has the goto RetryFault_oom after vm_waitpfault(. . .), > thereby avoiding the vm_pageout_oom(VM_OOM_MEM_PF) for > vm_pfault_oom_attempts < 0 || fs->oom < vm_pfault_oom_attempts ) : > > if (fs.m == NULL) { > unlock_and_deallocate(&fs); > if (vm_pfault_oom_attempts < 0 || > oom < vm_pfault_oom_attempts) { > oom++; > vm_waitpfault(dset, > vm_pfault_oom_wait * hz); > goto RetryFault_oom; > } > if (bootverbose) > printf( > "proc %d (%s) failed to alloc page on fault, starting OOM\n", > curproc->p_pid, curproc->p_comm); > vm_pageout_oom(VM_OOM_MEM_PF); > goto RetryFault; > } > > I expect this is the source of the behavioral > difference folks have been seeing for OOM kills. > > > As for "gather evidence" messages . . . > > >> You may be able to just look and manually delete or > >> comment out the bootverbose line in the more modern > >> source that currently looks like: > >> > >> if (bootverbose) > >> printf( > >> "proc %d (%s) failed to alloc page on fault, starting OOM\n", > >> curproc->p_pid, curproc->p_comm); > >> vm_pageout_oom(VM_OOM_MEM_PF); > >> return (KERN_RESOURCE_SHORTAGE); > >> > > > > I can find those lines in /usr/src/sys/vm/vm_fault.c, but > > unclear on the motivation to comment the lines out. Perhaps > > to eliminate the return(...) ? Anyway, is it sufficient > > to insert /* before and */ after? > > The only line to delete or comment out in that > code block is: > > if (bootverbose) > > Disabling that line makes the following printf > always happen, even when a verbose boot was not > done. Oops, it's commented out now and the kernel is rebuilding. > > Based on the above reported code change, having > a message before vm_pageout_oom(VM_OOM_MEM_PF) is > important to getting a report of the kill being > via that code. > Thank you! bob prohaska