From owner-freebsd-arm@freebsd.org  Tue Mar 24 20:22:08 2020
Return-Path: <owner-freebsd-arm@freebsd.org>
Delivered-To: freebsd-arm@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1335D26970A
 for <freebsd-arm@mailman.nyi.freebsd.org>;
 Tue, 24 Mar 2020 20:22:08 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
Received: from sonic313-20.consmr.mail.gq1.yahoo.com
 (sonic313-20.consmr.mail.gq1.yahoo.com [98.137.65.83])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 48n2j16K17z45Z9
 for <freebsd-arm@freebsd.org>; Tue, 24 Mar 2020 20:21:57 +0000 (UTC)
 (envelope-from marklmi@yahoo.com)
X-YMail-OSG: utDf450VM1ldyPv6l1g5SivZzGpDLEouEk9lYK_sxXH94kQ9ARQVnSCdLJXxgH0
 PLQ9FhP0_QfjlaNVorZwRLVgdZ2IzRiOPZPy18KyJKRQzUQ9XVOcqWbMyZUV67doEtrkNThNZKo8
 rAMdVrArCQAXn6VaR5LT5nvCo54jDHAFi3mImRrlld1OTw7zIrh9tAZtahMm8rZORaq2VwEIRkdx
 aQCfAcLHtBrJejDlR1NOrWqlPy.6W2LYqA5dH9wye7.1ucr_H_0EPpoE2wilLFXTF_Aq3AAPB4nu
 fZ0KSYPYWIGvPo3KYUYF1gbj8sb9GrqDtrFb3iflyheTW6Xw58TE5i9JCeGTLyl.hb4tdttomBpt
 8onZj4G2TAVHHtoaqcqBjdSu9VedKSPcWUQjmFYfJ.5s8YnWL3JC2cHczWOwFa.o6CMgw54wfrwU
 wKcYcx4LnRWd1J9P_F52Z3HT_vSGoXRlEqCpummy_Dmj8_tEi_7pwR7gW5GhL_7Y8vSeHBHqCVJX
 grY1mD4lEHEDi53rN.af7Ndqq7Dx7O2gI0i6K3ZamDKFo9O6wUOUsw23zLYHQ62h2TnKPB7IhIow
 SJD1SN5OQu6DI3QLNHOBir3VVBQm2U4yJd5l_KzCTwfJpxxqRVZnhjcgolgowZHCs7hbzzZIS9AL
 4sakLXSK7hy3WP1vg3rPt9YcIVUtRK6uaPSM6mz8Emr5Z7uhMGOwzv8KJ16OLwDuH73hgBE3qc6s
 INRWO2pbxTLL.0KUZD0PDZ5XmSTxuioeQBuw0o9GRyvowYCMFGeL2uSKhEpskDhpntXladtiS.rN
 TdJaSXbMfHBr1lMwRKtxZhGIJz1QPZ2ZZpYHV_woHNTPl5cg8.SEEA02_Nbh2gbq1JHVDCUII5I2
 1h1noQv3CIZtMcpHSXapYCp6zB1uQv8eo5_FJMu34z6PTVZMshTDz7z7EJ7XDkcZsWspG9Ii4MTg
 EckPe5V2ljMAGtaPV0BM2x4uHW.ysTumRetrOq66WUXol3WN5E3GoBRqeyH5uK45pG4lRQfpNbPn
 KRN2dyc5FI8N4Xpn46ZKAN7DA7NqMPxqQCdW7OMDiXorZEBwFYdRdd0ZlKgbtS6W.tvZh68aatMa
 yV38JQr9A2gIMTFNzJLZ5buDZ.dnK5zL10MeQfZx1Zq5V8LyJY5Xm1YXYtvYObWKz4a7vuL.cPUm
 zakJ2BEbs0AmtplTW90kCEG8sPAbjwDU34603wQH0wglPCwUs2q.L0lM1ZUsDOypaaAf8955hOv1
 5pPDrOs_gV3PkewCstF.OtxcKicrJbpzGegnmvpIWxVTqlawiDVe1qXA9zTuJ0Xurox_FFGxQM2w
 SF2e34uCXSiVEIWSB3hOBnFpk5_Z31jUDOIHqunhb7wkfHnL_VwgzplAf35p40SI4hnhHNxgcCpc
 Ipyn3nY6qLtktQlWgZST3pa34GQ0l4Dxeh4BAY.u2jYTAQ4ZSJFL2Kw--
Received: from sonic.gate.mail.ne1.yahoo.com by
 sonic313.consmr.mail.gq1.yahoo.com with HTTP; Tue, 24 Mar 2020 20:21:49 +0000
Received: by smtp426.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA
 ID 9b577911a420bb135562a20b1228b863; 
 Tue, 24 Mar 2020 20:21:43 +0000 (UTC)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.60.0.2.5\))
Subject: Re: Belated out of swap kill on rpi3 at r359216
From: Mark Millard <marklmi@yahoo.com>
In-Reply-To: <20200324185518.GA92311@www.zefox.net>
Date: Tue, 24 Mar 2020 13:21:42 -0700
Cc: freebsd-arm@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <75CE3C07-8A0A-4D32-84C6-24BEA967447E@yahoo.com>
References: <20200324155753.GA91922@www.zefox.net>
 <83E41A13-6C24-4B56-A837-779044038FBC@yahoo.com>
 <20200324185518.GA92311@www.zefox.net>
To: bob prohaska <fbsd@www.zefox.net>
X-Mailer: Apple Mail (2.3608.60.0.2.5)
X-Rspamd-Queue-Id: 48n2j16K17z45Z9
X-Spamd-Bar: --
X-Spamd-Result: default: False [-2.09 / 15.00]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ptr:yahoo.com];
 FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[];
 DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2];
 DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject];
 FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+];
 RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com];
 ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0];
 ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.73)[-0.730,0];
 R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048];
 URIBL_BLOCKED(0.00)[dsl-only.net.multi.uribl.com];
 FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-0.86)[-0.864,0];
 MIME_GOOD(-0.10)[text/plain];
 IP_SCORE(0.00)[ip: (2.66), ipnet: 98.137.64.0/21(0.82), asn: 36647(0.66),
 country: US(-0.05)]; IP_SCORE_FREEMAIL(0.00)[];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 RCVD_IN_DNSWL_NONE(0.00)[83.65.137.98.list.dnswl.org : 127.0.5.0];
 RWL_MAILSPIKE_POSSIBLE(0.00)[83.65.137.98.rep.mailspike.net : 127.0.0.17];
 RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Porting FreeBSD to ARM processors." <freebsd-arm.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm/>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Mar 2020 20:22:08 -0000


On 2020-Mar-24, at 11:55, bob prohaska <fbsd at www.zefox.net> wrote:

> On Tue, Mar 24, 2020 at 10:09:28AM -0700, Mark Millard wrote:
>>=20
>>=20
>> On 2020-Mar-24, at 08:57, bob prohaska <fbsd at www.zefox.net> wrote:
>>=20
>>> An attempt to buildworld on an rpi3 running r359216 stopped with an
>>> OOMA kill. Sources were at 359264, loader.conf contained
>>> vm.pageout_oom_seq=3D"4096" .=20
>>=20
>> What of the value of vm.pfault_oom_attempts ?
>>=20
>> If vm.pfault_oom_attempts was not -1,
>> what of the value of vm.pfault_oom_wait ?
>>=20
> bob@www:/usr/src % sysctl vm.pfault_oom_wait
> vm.pfault_oom_wait: 10
> I didn't change it, this must be the default.=20
> Would it be useful to add something like
> vm.pfault_oom_wait=3D"20"=20
> to /boot/loader.conf?

I assume that this answer means vm.pfault_oom_attempts
was not -1. But you did not list what it was.

vm.pfault_oom_attempts=3D-1 is the means of avoiding
the the vm.pfault_oom_attempts,vm.pfault_oom_wait
pair from causing OOM at all. (vm.pfault_oom_wait
becomes irrelevant.)

When vm.pfault_oom_attempts !=3D -1 , then there are
potential time-outs that overall involve:

   vm.pfault_oom_attempts * vm.pfault_oom_wait

but has some tradeoffs in the partitioning between
the 2 factors:

# sysctl -d vm.pfault_oom_attempts
vm.pfault_oom_attempts: Number of page allocation attempts in page fault =
handler before it triggers OOM handling

# sysctl -d vm.pfault_oom_wait
vm.pfault_oom_wait: Number of seconds to wait for free pages before =
retrying the page fault handler

As I understand, the following cases could each have
very different results depending on what the latencies
are like and such:

vm.pfault_oom_attempts=3D=3D20 && vm.pfault_oom_wait=3D=3D1
vs.
vm.pfault_oom_attempts=3D=3D1 && vm.pfault_oom_wait=3D=3D20
vs.
vm.pfault_oom_attempts=3D=3D4 && vm.pfault_oom_wait=3D=3D5

As I remember, Konstantin Belousov once asked someone
that was having a repeatable problem to try some
alternatives that explored this but, as I remember,
he got no reply to the request.

The person might have just used: vm.pfault_oom_attempts=3D-1
and continued with their primary activity, for all I know.

vm.pfault_oom_attempts=3D-1 is only recommended when one
is sure that they will not run out of swap/paging space,
if I understand right.

For reference, for other settings:

# sysctl -d vm.pageout_oom_seq
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM

There is also:

# sysctl -d vm.oom_pf_secs
vm.oom_pf_secs:=20

but it seems to have an empty description. May be
it reports vm.pfault_oom_attempts * vm.pfault_oom_wait
when vm.pfault_oom_attempts !=3D -1 . (I've not looked.)

>>> A snipped of gstat log suggests the worst congestion in the storage =
I/O
>>> happened at Tue Mar 24 04:52:26 PDT 2020 with an L(q) of 37, but the
>>> OOMA kill happened at Tue Mar 24 04:53:04 PDT 2020, by which time =
the
>>> L(q) had dropped to one, half a minute later.
>>>=20
>>> Is the delay in OOMA action to be expected?=20
>>>=20
>>> Here's the relevant part of the log, I hope the columns display =
readably:
>>>=20
>>> 0/2/2/19177 mbuf clusters in use (current/cache/total/max)
>>> procs     memory       page                      disks     faults    =
   cpu
>>> r b w     avm     fre  flt  re  pi  po    fr   sr mm0 da0   in   sy  =
 cs us sy id
>>> 0  0 29 1897320   47312 12851   9   4   5 13330 1669   0   0 14172  =
8020  3034 65 29  6
>>> dT: 1.056s  w: 1.000s
>>> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps  =
 ms/d   %busy Name
>>>  37    367    323   5463    6.4     45   1511   76.8      0      0   =
 0.0   91.7  mmcsd0
>>>  37    367    323   5463    6.5     45   1511   76.9      0      0   =
 0.0   93.3  mmcsd0s2
>>>  34    133    111   3209    7.6     22    697  134.7      0      0   =
 0.0   74.0  mmcsd0s2a
>>>   3    235    212   2254    5.9     23    814   21.5      0      0   =
 0.0   70.0  mmcsd0s2b
>>>  34    133    111   3209    7.6     22    697  134.7      0      0   =
 0.0   74.2  ufs/rootfs
>>> Tue Mar 24 04:52:26 PDT 2020
>>> Device          1K-blocks     Used    Avail Capacity
>>> /dev/mmcsd0s2b    4404252   224484  4179768     5%
>>> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: =
bufobj: 0, blkno: 21842, size: 12288
>>> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: =
bufobj: 0, blkno: 25477, size: 4096
>>=20
>> How bad were things back when the "indefinate wait buffer" notices
>> were generated (Mar 24 04:20:50 time frame)?
>>=20
> It looks like _new_ indefinite wait messages started coming at around =
Mon Mar 23 23:00:28 PDT 2020:
> procs     memory       page                      disks     faults      =
 cpu
> r b w     avm     fre  flt  re  pi  po    fr   sr mm0 da0   in   sy   =
cs us sy id
> 2  0 18 1588824  197676 14108   2   1   0 14759  238   0   0 14055  =
9263  2668 62 32  5
> dT: 1.027s  w: 1.000s
> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   =
ms/d   %busy Name
>    9      1      0      0    0.0      1     31  13451      0      0    =
0.0  326.9  mmcsd0
>    9      1      0      0    0.0      1     31  13451      0      0    =
0.0  326.9  mmcsd0s2
>    6      1      0      0    0.0      1     31  13451      0      0    =
0.0  326.9  mmcsd0s2a
>    6      1      0      0    0.0      1     31  13451      0      0    =
0.0  326.9  ufs/rootfs
> Mon Mar 23 23:00:28 PDT 2020
> Device          1K-blocks     Used    Avail Capacity
> /dev/mmcsd0s2b    4404252   245288  4158964     6%
> Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: =
bufobj: 0, blkno: 280324, size: 8192
> Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: =
bufobj: 0, blkno: 285903, size: 16384
>=20
> Admittedly, 300% busy looks pretty bad. Still the machine didn't =
quit.....
> On the next sample %busy went down to 18% for swap, less for all else.

As I understand, the ms/w value 13451 means that there was
for a time (a mean of ?) somewhat under 14 seconds from a
write being queued to it being completed.

If a read could be blocked for such time frames by such
writes, it suggests that vm.pfault_oom_wait might need to
be larger than 13 if vm.pfault_oom_attempts=3D-1 is not in
use.

Or vm.pfault_oom_attempts=3D-1 could be used to avoid large
time frames from directly leading to OOM activity.

>>=20
>> Below I show code changes to be more explicit in the
>> output messaging about what contributes to initiating
>> OOM kills, without needing boot verbose or the like.
>> There are also some messages from Mark J.'s old code
>> for reporting on things related to initiating OOM
>> kills, or some minor variations of them.
>>=20
>=20
>> You may want to try such changes to provide more
>> context for your OOM kills when they happen. Below
>> the 4 reasons reported are (showing the most
>> detailed of the related messages from different
>> stages):
>>=20
>> swp_pager_meta_build: swap blk uma zone exhausted
>>=20
>> swp_pager_meta_build: swap pctrie uma zone exhausted
>>=20
>> vm_fault_allocate: proc %d (%s) failed to alloc page on fault, =
starting OOM
>>=20
>> m_pageout_mightbe_oom: kill context: v_free_count: %u, =
v_inactive_count: %u
>>=20
>>=20
>=20
> Could inquiries instead  be added to the logging script?
> Right now it's
> #!/bin/sh
> while true
> do vmstat ; gstat -abd -I 1s ; date ; swapinfo ; tail -n 2 =
/var/log/messages ; netstat -m | grep "mbuf clusters"
> done
>=20
> I'd much rather tamper with the logging script than the kernel 8-)

Unfortunately, the reason I made the kernel messaging
changes is that, as far as I know, the normal kernel
simply does not publish the information anywhere when
the internal criteria leads to OOM activity: No
inquiry available without kernel changes.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)