From owner-freebsd-current@freebsd.org Tue Jan 28 21:31:52 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 73F2C241727 for ; Tue, 28 Jan 2020 21:31:52 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic311-24.consmr.mail.gq1.yahoo.com (sonic311-24.consmr.mail.gq1.yahoo.com [98.137.65.205]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 486fvW5qSTz4fLm for ; Tue, 28 Jan 2020 21:31:51 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: cMEYhgMVM1k4a2sBl3NJvCsc7U0RIQz2j1SNItJqYGqJGwe0wyg7f55rdFEQXSV 6lEovtp79sj7A6PldZOm2anbXhC4lNotz_irA3ZSL.AF_9CYv0O72fx0CYbfEC5jAMa8M5fp3h9M qRVk.iBOcBPDi546L1RgynRZzCZdcT5.FtQXWs1SdW1Z5hrsNF.RiBXvqTkDNB_TBXKxFi5iadWK MBxC1tXUhZP3fZhl4ZR.KzV6.4aXE2OwmIYSUrSe7FCrIfdoSOsw81PtKa6wxmfi06LYMqJJiCGY RnmgE9M_1zKcYuk7o.7xDPdEuRVvqaCo01dt.zI3Ukricf4MAQ694MpeV8iG1OVdQ0Kms0kdrbp4 qZt__.jPKggtVzz4GmlfClNOiiy.31pqcn70DG3IsWOJUQ47jcS5sLo06yZ9vQdE9.bCxkDnGXr6 w56csYwBVmBE81wYjslIpcZVjjupnEZSw6TskgXS_s796Dp11nl40l61c3RbsWW7IO7msnzsDtZL KfxqNsFem5mfklyXv0dUP6Q7Ii_QPdizLtlWW8BLHitSqpawOOp0vAhXGdytxjPsxDiS0C4baws1 O_lGzjLNyf2waQKItFd_bgMAgV6oaDt91ZrBfiffNpzb7BmTepeD5VvwZD3YlxbMvT9vbRU0jj0i F9olaDyKZIA5cwlCfxWJq.kUzsX1UFKiGytVbdEXDci19sjH8DEQI1CD4N11uHrs.OoMr0PD_mrj .2u.mLNpZ8nlBUXtRbHSfhDIbu5DiccfcUZwn.OmizLtSPgayG3QItunhB2BjMl7DXAAXx8kKsZf .He0L.U501mbJMToWQEh0Pavll1a52.GXDX9XTviK5gptI7V9Y8QSSmOm_xCo3pqNX.6GciL6hvS YIHyi2kgTlA8Yiiu6u4H6Fhs3mS5Qd4R9aqFklY5hEheVnyWd8j6ALypg0b3moleiNVM3fcOCIR0 9gdJtLojURkqiktQ0HE4VQDVo2HR5V3VbhoMzi013O217fmQcy0mHOZUKq.SIsn7H__VaS_7TnKS CBDy08Fa_Tn7mFPzwMeOaP4bOr6JyKfQK2hZJ.mDKASPSjX1e2e47pIlFfMKM9.ARNg2FtOHmxDk l1GAnJX6AJsrlgeHnXnYqZfzpYwJQnSl5wkZ1_2LBxDF3FBUV_mpshi66HAcSyr1bFonoCEJPK7B pOLd5hd5IPmpzR0rv5JY99rLe2lqeTFf062r3_3Xn5HCA8KUJ1P_lGp.C.TZ23s0wyHpOLKKwtr_ dzNrWJslMc54bX.sTXOIHdYUJeDpVkZInLcjOjLwMYTAv9PwgoFVRVeus28PdfciGfg8mmYMnaAB x2mhiW3jOKMadhuHsr8cgysV0bYbGIfCguTvy9x1pv1eMqytrqxYQlnGcJhu22bx7L0QLNScJQRT 8jlYEsAbBC3eAXaBIL6WoFepMeVQNX7mVZ8JTl.d7m0_BHdXLFRhrdTV.xkM_55rrY44oPjKcY6n F00M- Received: from sonic.gate.mail.ne1.yahoo.com by sonic311.consmr.mail.gq1.yahoo.com with HTTP; Tue, 28 Jan 2020 21:31:50 +0000 Received: by smtp407.mail.gq1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 7a4ef4d5a1f1533fea3df44137dd5973; Tue, 28 Jan 2020 21:31:46 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.40.2.2.4\)) Subject: Re: OOMA kill with vm.pfault_oom_attempts="-1" on RPi3 at r357147 (a vm_pfault_oom_attempts < 0 handling bug as of head -r357026) From: Mark Millard In-Reply-To: <20200128201152.GA15110@www.zefox.net> Date: Tue, 28 Jan 2020 13:31:45 -0800 Cc: freebsd-arm , FreeBSD Current Content-Transfer-Encoding: quoted-printable Message-Id: <56B3107B-E515-428D-A837-8AF076BADE9B@yahoo.com> References: <20200127190709.GA11328@www.zefox.net> <20200128035317.GA12644@www.zefox.net> <18150258-6210-451E-A5B9-528129A05974@yahoo.com> <9BF68EF1-F83A-473B-9A7B-B3956D6A5EFD@yahoo.com> <20200128170518.GA14654@www.zefox.net> <5A3CE2DA-C5B8-4CC1-BEEA-8B9649A20B8B@yahoo.com> <20200128190210.GA14784@www.zefox.net> <94E68249-7751-4B27-AE95-E9C2776D730B@yahoo.com> <20200128201152.GA15110@www.zefox.net> To: bob prohaska X-Mailer: Apple Mail (2.3608.40.2.2.4) X-Rspamd-Queue-Id: 486fvW5qSTz4fLm X-Spamd-Bar: / X-Spamd-Result: default: False [-0.73 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.41)[-0.409,0]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; FREEMAIL_FROM(0.00)[yahoo.com]; MIME_GOOD(-0.10)[text/plain]; MV_CASE(0.50)[]; RCVD_COUNT_TWO(0.00)[2]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; RCVD_IN_DNSWL_NONE(0.00)[205.65.137.98.list.dnswl.org : 127.0.5.0]; NEURAL_SPAM_LONG(0.18)[0.176,0]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.00)[ip: (4.56), ipnet: 98.137.64.0/21(0.84), asn: 36647(0.67), country: US(-0.05)]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jan 2020 21:31:52 -0000 [I recommend not sending kib our other exchanges: that he has been notified is enough. I also sent the material to 2 folks that I forgot at the time and replied to one more related message with the information. For most of these folks our general exchange is likely viewed as noise after then basic, important point for them. Note: kib was the reviewer, not the creator or submitter of -r357026 .] On 2020-Jan-28, at 12:11, bob prohaska wrote: > On Tue, Jan 28, 2020 at 11:28:14AM -0800, Mark Millard wrote: >>=20 >>=20 >> On 2020-Jan-28, at 11:02, bob prohaska wrote: >>=20 >>> On Tue, Jan 28, 2020 at 09:42:17AM -0800, Mark Millard wrote: >>>>=20 >>>>=20 >>>>=20 >>> The (partly)modified kernel compiled and booted without >>> obvious trouble. It's trying to finish buildworld now. >>>=20 > Stopped already, with=20 > Jan 28 11:41:59 www kernel: pid 29909 (cc), jid 0, uid 0, was killed: = fault's page allocation failed >=20 Yea, what I did in vm_pageout_oom for its messages does indictae when the vm_pageout_oom(VM_OOM_MEM_PF) happens of itself. So the printf before that call is not essential. With the bug that we have identified, this is the expected notification until things are fixed. >=20 >>>> If you are testing with vm.pfault_oom_attempts=3D"-1" then >>>> the vm_fault printf message should never happen anyway. >>>>=20 >>> Would it not be interesting if the message appeared in that >>> case?=20 >>=20 >> Thanks for the question: looking at the new code found a bug >> causing oom where it used to be avoided in head -r357025 and >> before. >=20 >=20 > Glad to be of service, even if inadvertently 8-) >=20 >=20 >> After vm_waitpfault(dset, vm_pfault_oom_wait * hz) >> the -r357026 code does a vm_pageout_oom(VM_OOM_MEM_PF) no >> matter what, even when vm_pfault_oom_attempts < 0 || >> fs->oom < vm_pfault_oom_attempts : >>=20 >> New code in head -r357026 >> ( nothing to avoid the vm_pageout_oom(VM_OOM_MEM_PF) >> for vm_pfault_oom_attempts < 0 || >> fs->oom < vm_pfault_oom_attempts ): >>=20 >> if (fs->m =3D=3D NULL) { >> unlock_and_deallocate(fs); >> if (vm_pfault_oom_attempts < 0 || >> fs->oom < vm_pfault_oom_attempts) { >> fs->oom++; >> vm_waitpfault(dset, vm_pfault_oom_wait * hz); >> } >> if (bootverbose) >> printf( >> "proc %d (%s) failed to alloc page on fault, starting OOM\n", >> curproc->p_pid, curproc->p_comm); >> vm_pageout_oom(VM_OOM_MEM_PF); >> return (KERN_RESOURCE_SHORTAGE); >> } >>=20 >> Old code in head -r357025 >> ( has the goto RetryFault_oom after vm_waitpfault(. . .), >> thereby avoiding the vm_pageout_oom(VM_OOM_MEM_PF) for >> vm_pfault_oom_attempts < 0 || fs->oom < vm_pfault_oom_attempts ) : >>=20 >> if (fs.m =3D=3D NULL) { >> unlock_and_deallocate(&fs); >> if (vm_pfault_oom_attempts < 0 || >> oom < vm_pfault_oom_attempts) { >> oom++; >> vm_waitpfault(dset, >> vm_pfault_oom_wait * hz); >> goto RetryFault_oom; >> } >> if (bootverbose) >> printf( >> "proc %d (%s) failed to alloc page on fault, starting OOM\n", >> curproc->p_pid, = curproc->p_comm); >> vm_pageout_oom(VM_OOM_MEM_PF); >> goto RetryFault; >> } >>=20 >> I expect this is the source of the behavioral >> difference folks have been seeing for OOM kills. >>=20 >>=20 >> As for "gather evidence" messages . . . >>=20 >>>> You may be able to just look and manually delete or >>>> comment out the bootverbose line in the more modern >>>> source that currently looks like: >>>>=20 >>>> if (bootverbose) >>>> printf( >>>> "proc %d (%s) failed to alloc page on fault, starting OOM\n", >>>> curproc->p_pid, curproc->p_comm); >>>> vm_pageout_oom(VM_OOM_MEM_PF); >>>> return (KERN_RESOURCE_SHORTAGE); >>>>=20 >>>=20 >>> I can find those lines in /usr/src/sys/vm/vm_fault.c, but >>> unclear on the motivation to comment the lines out. Perhaps=20 >>> to eliminate the return(...) ? Anyway, is it sufficient=20 >>> to insert /* before and */ after?=20 >>=20 >> The only line to delete or comment out in that >> code block is: >>=20 >> if (bootverbose) >>=20 >> Disabling that line makes the following printf >> always happen, even when a verbose boot was not >> done. > Oops, it's commented out now and the kernel is rebuilding. Not a big deal, given the "was killed: fault's page allocation failed" message that is separately generated. >>=20 >> Based on the above reported code change, having >> a message before vm_pageout_oom(VM_OOM_MEM_PF) is >> important to getting a report of the kill being >> via that code. >>=20 I did not think of what I'd done in vm_pageout_oom when I wrote that. My hope is that at least something like what I did in vm_pageout_oom for message content will be adopted so the notices are accurate to context and more traceable. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)