From owner-freebsd-arm@freebsd.org Tue Mar 24 20:22:08 2020 Return-Path: Delivered-To: freebsd-arm@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1335D26970A for ; Tue, 24 Mar 2020 20:22:08 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic313-20.consmr.mail.gq1.yahoo.com (sonic313-20.consmr.mail.gq1.yahoo.com [98.137.65.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48n2j16K17z45Z9 for ; Tue, 24 Mar 2020 20:21:57 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: utDf450VM1ldyPv6l1g5SivZzGpDLEouEk9lYK_sxXH94kQ9ARQVnSCdLJXxgH0 PLQ9FhP0_QfjlaNVorZwRLVgdZ2IzRiOPZPy18KyJKRQzUQ9XVOcqWbMyZUV67doEtrkNThNZKo8 rAMdVrArCQAXn6VaR5LT5nvCo54jDHAFi3mImRrlld1OTw7zIrh9tAZtahMm8rZORaq2VwEIRkdx aQCfAcLHtBrJejDlR1NOrWqlPy.6W2LYqA5dH9wye7.1ucr_H_0EPpoE2wilLFXTF_Aq3AAPB4nu fZ0KSYPYWIGvPo3KYUYF1gbj8sb9GrqDtrFb3iflyheTW6Xw58TE5i9JCeGTLyl.hb4tdttomBpt 8onZj4G2TAVHHtoaqcqBjdSu9VedKSPcWUQjmFYfJ.5s8YnWL3JC2cHczWOwFa.o6CMgw54wfrwU wKcYcx4LnRWd1J9P_F52Z3HT_vSGoXRlEqCpummy_Dmj8_tEi_7pwR7gW5GhL_7Y8vSeHBHqCVJX grY1mD4lEHEDi53rN.af7Ndqq7Dx7O2gI0i6K3ZamDKFo9O6wUOUsw23zLYHQ62h2TnKPB7IhIow SJD1SN5OQu6DI3QLNHOBir3VVBQm2U4yJd5l_KzCTwfJpxxqRVZnhjcgolgowZHCs7hbzzZIS9AL 4sakLXSK7hy3WP1vg3rPt9YcIVUtRK6uaPSM6mz8Emr5Z7uhMGOwzv8KJ16OLwDuH73hgBE3qc6s INRWO2pbxTLL.0KUZD0PDZ5XmSTxuioeQBuw0o9GRyvowYCMFGeL2uSKhEpskDhpntXladtiS.rN TdJaSXbMfHBr1lMwRKtxZhGIJz1QPZ2ZZpYHV_woHNTPl5cg8.SEEA02_Nbh2gbq1JHVDCUII5I2 1h1noQv3CIZtMcpHSXapYCp6zB1uQv8eo5_FJMu34z6PTVZMshTDz7z7EJ7XDkcZsWspG9Ii4MTg EckPe5V2ljMAGtaPV0BM2x4uHW.ysTumRetrOq66WUXol3WN5E3GoBRqeyH5uK45pG4lRQfpNbPn KRN2dyc5FI8N4Xpn46ZKAN7DA7NqMPxqQCdW7OMDiXorZEBwFYdRdd0ZlKgbtS6W.tvZh68aatMa yV38JQr9A2gIMTFNzJLZ5buDZ.dnK5zL10MeQfZx1Zq5V8LyJY5Xm1YXYtvYObWKz4a7vuL.cPUm zakJ2BEbs0AmtplTW90kCEG8sPAbjwDU34603wQH0wglPCwUs2q.L0lM1ZUsDOypaaAf8955hOv1 5pPDrOs_gV3PkewCstF.OtxcKicrJbpzGegnmvpIWxVTqlawiDVe1qXA9zTuJ0Xurox_FFGxQM2w SF2e34uCXSiVEIWSB3hOBnFpk5_Z31jUDOIHqunhb7wkfHnL_VwgzplAf35p40SI4hnhHNxgcCpc Ipyn3nY6qLtktQlWgZST3pa34GQ0l4Dxeh4BAY.u2jYTAQ4ZSJFL2Kw-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic313.consmr.mail.gq1.yahoo.com with HTTP; Tue, 24 Mar 2020 20:21:49 +0000 Received: by smtp426.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 9b577911a420bb135562a20b1228b863; Tue, 24 Mar 2020 20:21:43 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.60.0.2.5\)) Subject: Re: Belated out of swap kill on rpi3 at r359216 From: Mark Millard In-Reply-To: <20200324185518.GA92311@www.zefox.net> Date: Tue, 24 Mar 2020 13:21:42 -0700 Cc: freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <75CE3C07-8A0A-4D32-84C6-24BEA967447E@yahoo.com> References: <20200324155753.GA91922@www.zefox.net> <83E41A13-6C24-4B56-A837-779044038FBC@yahoo.com> <20200324185518.GA92311@www.zefox.net> To: bob prohaska X-Mailer: Apple Mail (2.3608.60.0.2.5) X-Rspamd-Queue-Id: 48n2j16K17z45Z9 X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.09 / 15.00]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.73)[-0.730,0]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; URIBL_BLOCKED(0.00)[dsl-only.net.multi.uribl.com]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-0.86)[-0.864,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(0.00)[ip: (2.66), ipnet: 98.137.64.0/21(0.82), asn: 36647(0.66), country: US(-0.05)]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[83.65.137.98.list.dnswl.org : 127.0.5.0]; RWL_MAILSPIKE_POSSIBLE(0.00)[83.65.137.98.rep.mailspike.net : 127.0.0.17]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Mar 2020 20:22:08 -0000 On 2020-Mar-24, at 11:55, bob prohaska wrote: > On Tue, Mar 24, 2020 at 10:09:28AM -0700, Mark Millard wrote: >>=20 >>=20 >> On 2020-Mar-24, at 08:57, bob prohaska wrote: >>=20 >>> An attempt to buildworld on an rpi3 running r359216 stopped with an >>> OOMA kill. Sources were at 359264, loader.conf contained >>> vm.pageout_oom_seq=3D"4096" .=20 >>=20 >> What of the value of vm.pfault_oom_attempts ? >>=20 >> If vm.pfault_oom_attempts was not -1, >> what of the value of vm.pfault_oom_wait ? >>=20 > bob@www:/usr/src % sysctl vm.pfault_oom_wait > vm.pfault_oom_wait: 10 > I didn't change it, this must be the default.=20 > Would it be useful to add something like > vm.pfault_oom_wait=3D"20"=20 > to /boot/loader.conf? I assume that this answer means vm.pfault_oom_attempts was not -1. But you did not list what it was. vm.pfault_oom_attempts=3D-1 is the means of avoiding the the vm.pfault_oom_attempts,vm.pfault_oom_wait pair from causing OOM at all. (vm.pfault_oom_wait becomes irrelevant.) When vm.pfault_oom_attempts !=3D -1 , then there are potential time-outs that overall involve: vm.pfault_oom_attempts * vm.pfault_oom_wait but has some tradeoffs in the partitioning between the 2 factors: # sysctl -d vm.pfault_oom_attempts vm.pfault_oom_attempts: Number of page allocation attempts in page fault = handler before it triggers OOM handling # sysctl -d vm.pfault_oom_wait vm.pfault_oom_wait: Number of seconds to wait for free pages before = retrying the page fault handler As I understand, the following cases could each have very different results depending on what the latencies are like and such: vm.pfault_oom_attempts=3D=3D20 && vm.pfault_oom_wait=3D=3D1 vs. vm.pfault_oom_attempts=3D=3D1 && vm.pfault_oom_wait=3D=3D20 vs. vm.pfault_oom_attempts=3D=3D4 && vm.pfault_oom_wait=3D=3D5 As I remember, Konstantin Belousov once asked someone that was having a repeatable problem to try some alternatives that explored this but, as I remember, he got no reply to the request. The person might have just used: vm.pfault_oom_attempts=3D-1 and continued with their primary activity, for all I know. vm.pfault_oom_attempts=3D-1 is only recommended when one is sure that they will not run out of swap/paging space, if I understand right. For reference, for other settings: # sysctl -d vm.pageout_oom_seq vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM There is also: # sysctl -d vm.oom_pf_secs vm.oom_pf_secs:=20 but it seems to have an empty description. May be it reports vm.pfault_oom_attempts * vm.pfault_oom_wait when vm.pfault_oom_attempts !=3D -1 . (I've not looked.) >>> A snipped of gstat log suggests the worst congestion in the storage = I/O >>> happened at Tue Mar 24 04:52:26 PDT 2020 with an L(q) of 37, but the >>> OOMA kill happened at Tue Mar 24 04:53:04 PDT 2020, by which time = the >>> L(q) had dropped to one, half a minute later. >>>=20 >>> Is the delay in OOMA action to be expected?=20 >>>=20 >>> Here's the relevant part of the log, I hope the columns display = readably: >>>=20 >>> 0/2/2/19177 mbuf clusters in use (current/cache/total/max) >>> procs memory page disks faults = cpu >>> r b w avm fre flt re pi po fr sr mm0 da0 in sy = cs us sy id >>> 0 0 29 1897320 47312 12851 9 4 5 13330 1669 0 0 14172 = 8020 3034 65 29 6 >>> dT: 1.056s w: 1.000s >>> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps = ms/d %busy Name >>> 37 367 323 5463 6.4 45 1511 76.8 0 0 = 0.0 91.7 mmcsd0 >>> 37 367 323 5463 6.5 45 1511 76.9 0 0 = 0.0 93.3 mmcsd0s2 >>> 34 133 111 3209 7.6 22 697 134.7 0 0 = 0.0 74.0 mmcsd0s2a >>> 3 235 212 2254 5.9 23 814 21.5 0 0 = 0.0 70.0 mmcsd0s2b >>> 34 133 111 3209 7.6 22 697 134.7 0 0 = 0.0 74.2 ufs/rootfs >>> Tue Mar 24 04:52:26 PDT 2020 >>> Device 1K-blocks Used Avail Capacity >>> /dev/mmcsd0s2b 4404252 224484 4179768 5% >>> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 21842, size: 12288 >>> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 25477, size: 4096 >>=20 >> How bad were things back when the "indefinate wait buffer" notices >> were generated (Mar 24 04:20:50 time frame)? >>=20 > It looks like _new_ indefinite wait messages started coming at around = Mon Mar 23 23:00:28 PDT 2020: > procs memory page disks faults = cpu > r b w avm fre flt re pi po fr sr mm0 da0 in sy = cs us sy id > 2 0 18 1588824 197676 14108 2 1 0 14759 238 0 0 14055 = 9263 2668 62 32 5 > dT: 1.027s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps = ms/d %busy Name > 9 1 0 0 0.0 1 31 13451 0 0 = 0.0 326.9 mmcsd0 > 9 1 0 0 0.0 1 31 13451 0 0 = 0.0 326.9 mmcsd0s2 > 6 1 0 0 0.0 1 31 13451 0 0 = 0.0 326.9 mmcsd0s2a > 6 1 0 0 0.0 1 31 13451 0 0 = 0.0 326.9 ufs/rootfs > Mon Mar 23 23:00:28 PDT 2020 > Device 1K-blocks Used Avail Capacity > /dev/mmcsd0s2b 4404252 245288 4158964 6% > Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 280324, size: 8192 > Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 285903, size: 16384 >=20 > Admittedly, 300% busy looks pretty bad. Still the machine didn't = quit..... > On the next sample %busy went down to 18% for swap, less for all else. As I understand, the ms/w value 13451 means that there was for a time (a mean of ?) somewhat under 14 seconds from a write being queued to it being completed. If a read could be blocked for such time frames by such writes, it suggests that vm.pfault_oom_wait might need to be larger than 13 if vm.pfault_oom_attempts=3D-1 is not in use. Or vm.pfault_oom_attempts=3D-1 could be used to avoid large time frames from directly leading to OOM activity. >>=20 >> Below I show code changes to be more explicit in the >> output messaging about what contributes to initiating >> OOM kills, without needing boot verbose or the like. >> There are also some messages from Mark J.'s old code >> for reporting on things related to initiating OOM >> kills, or some minor variations of them. >>=20 >=20 >> You may want to try such changes to provide more >> context for your OOM kills when they happen. Below >> the 4 reasons reported are (showing the most >> detailed of the related messages from different >> stages): >>=20 >> swp_pager_meta_build: swap blk uma zone exhausted >>=20 >> swp_pager_meta_build: swap pctrie uma zone exhausted >>=20 >> vm_fault_allocate: proc %d (%s) failed to alloc page on fault, = starting OOM >>=20 >> m_pageout_mightbe_oom: kill context: v_free_count: %u, = v_inactive_count: %u >>=20 >>=20 >=20 > Could inquiries instead be added to the logging script? > Right now it's > #!/bin/sh > while true > do vmstat ; gstat -abd -I 1s ; date ; swapinfo ; tail -n 2 = /var/log/messages ; netstat -m | grep "mbuf clusters" > done >=20 > I'd much rather tamper with the logging script than the kernel 8-) Unfortunately, the reason I made the kernel messaging changes is that, as far as I know, the normal kernel simply does not publish the information anywhere when the internal criteria leads to OOM activity: No inquiry available without kernel changes. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)