Date: Tue, 28 Jan 2020 11:28:14 -0800 From: Mark Millard <marklmi@yahoo.com> To: bob prohaska <fbsd@www.zefox.net>, Konstantin Belousov <kib@freebsd.org> Cc: freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: OOMA kill with vm.pfault_oom_attempts="-1" on RPi3 at r357147 (a vm_pfault_oom_attempts < 0 handling bug as of head -r357026) Message-ID: <94E68249-7751-4B27-AE95-E9C2776D730B@yahoo.com> In-Reply-To: <20200128190210.GA14784@www.zefox.net> References: <20200127190709.GA11328@www.zefox.net> <D20DCE73-29D6-4184-80BF-7698EC907B60@yahoo.com> <20200128035317.GA12644@www.zefox.net> <18150258-6210-451E-A5B9-528129A05974@yahoo.com> <9BF68EF1-F83A-473B-9A7B-B3956D6A5EFD@yahoo.com> <20200128170518.GA14654@www.zefox.net> <5A3CE2DA-C5B8-4CC1-BEEA-8B9649A20B8B@yahoo.com> <20200128190210.GA14784@www.zefox.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-Jan-28, at 11:02, bob prohaska <fbsd at www.zefox.net> wrote: > On Tue, Jan 28, 2020 at 09:42:17AM -0800, Mark Millard wrote: >>=20 >>=20 >>=20 > The (partly)modified kernel compiled and booted without > obvious trouble. It's trying to finish buildworld now. >=20 >> If you are testing with vm.pfault_oom_attempts=3D"-1" then >> the vm_fault printf message should never happen anyway. >>=20 > Would it not be interesting if the message appeared in that > case?=20 Thanks for the question: looking at the new code found a bug causing oom where it used to be avoided in head -r357025 and before. After vm_waitpfault(dset, vm_pfault_oom_wait * hz) the -r357026 code does a vm_pageout_oom(VM_OOM_MEM_PF) no matter what, even when vm_pfault_oom_attempts < 0 || fs->oom < vm_pfault_oom_attempts : New code in head -r357026 ( nothing to avoid the vm_pageout_oom(VM_OOM_MEM_PF) for vm_pfault_oom_attempts < 0 || fs->oom < vm_pfault_oom_attempts ): if (fs->m =3D=3D NULL) { unlock_and_deallocate(fs); if (vm_pfault_oom_attempts < 0 || fs->oom < vm_pfault_oom_attempts) { fs->oom++; vm_waitpfault(dset, vm_pfault_oom_wait * hz); } if (bootverbose) printf( "proc %d (%s) failed to alloc page on fault, starting OOM\n", curproc->p_pid, curproc->p_comm); vm_pageout_oom(VM_OOM_MEM_PF); return (KERN_RESOURCE_SHORTAGE); } Old code in head -r357025 ( has the goto RetryFault_oom after vm_waitpfault(. . .), thereby avoiding the vm_pageout_oom(VM_OOM_MEM_PF) for vm_pfault_oom_attempts < 0 || fs->oom < vm_pfault_oom_attempts ) : if (fs.m =3D=3D NULL) { unlock_and_deallocate(&fs); if (vm_pfault_oom_attempts < 0 || oom < vm_pfault_oom_attempts) { oom++; vm_waitpfault(dset, vm_pfault_oom_wait * hz); goto RetryFault_oom; } if (bootverbose) printf( "proc %d (%s) failed to alloc page on fault, starting OOM\n", curproc->p_pid, = curproc->p_comm); vm_pageout_oom(VM_OOM_MEM_PF); goto RetryFault; } I expect this is the source of the behavioral difference folks have been seeing for OOM kills. As for "gather evidence" messages . . . >> You may be able to just look and manually delete or >> comment out the bootverbose line in the more modern >> source that currently looks like: >>=20 >> if (bootverbose) >> printf( >> "proc %d (%s) failed to alloc page on fault, starting OOM\n", >> curproc->p_pid, curproc->p_comm); >> vm_pageout_oom(VM_OOM_MEM_PF); >> return (KERN_RESOURCE_SHORTAGE); >>=20 >=20 > I can find those lines in /usr/src/sys/vm/vm_fault.c, but > unclear on the motivation to comment the lines out. Perhaps=20 > to eliminate the return(...) ? Anyway, is it sufficient=20 > to insert /* before and */ after?=20 The only line to delete or comment out in that code block is: if (bootverbose) Disabling that line makes the following printf always happen, even when a verbose boot was not done. Based on the above reported code change, having a message before vm_pageout_oom(VM_OOM_MEM_PF) is important to getting a report of the kill being via that code. >> and is now in vm_fault_allocate(. . .). (That file has >> hd a reorganization since where I'm synchronized.) >>=20 >> Having the message indicate vm_fault_allocate is >> optional but would look like: >>=20 >> "vm_fault_allocate: proc %d (%s) failed to alloc page on fault, = starting OOM\n", >>=20 >> Doing the delete/comment-out would avoid waiting for me. >>=20 >>=20 > I'll do it after the next stoppage. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?94E68249-7751-4B27-AE95-E9C2776D730B>