From nobody Tue Jul  4 21:22:18 2023
X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QwbNX1ns6z4lWV1
	for <freebsd-arm@mlmmj.nyi.freebsd.org>; Tue,  4 Jul 2023 21:22:36 +0000 (UTC)
	(envelope-from marklmi@yahoo.com)
Received: from sonic307-55.consmr.mail.gq1.yahoo.com (sonic307-55.consmr.mail.gq1.yahoo.com [98.137.64.31])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 4QwbNW6XR7z3x4V
	for <freebsd-arm@freebsd.org>; Tue,  4 Jul 2023 21:22:35 +0000 (UTC)
	(envelope-from marklmi@yahoo.com)
Authentication-Results: mx1.freebsd.org;
	none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1688505753; bh=Z+5xkZpxurHfsbB1vCzYGoZEIFncfTWJ15XKkXbQ724=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=aV8Hj3M63cMqG9HyRBgG3VcQLKxJDUaSlIBbXck/MH5SRux47LhVWfqTDXdCVN8RccJhgTCs7S+OA+0YHtrvHzd2lbnWixg8+hAd5pOFxafUOI2tYpzxc9F/YcTqmZXV8wE+NK7S6mcwV1WWPsgI9hIkCs57mV0fS0XFJShw6dNHnvdXReWya9UHjMxJBJZG/1a75VsHeeSeNdcWRKC71FmS8T/LfBvcQd2vDxo5VF57TuBySkyL+yCgjFwKw6wdfbWVAJoYPrYzGt2sA1qZbK5EiHLgeIJuse6nn6P3dijGQpXLY20n+ffe8Scz6RmJ+oFL3P2K++HWjaKMKjqsDg==
X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1688505753; bh=ljeaS/7mwE+9I/xqvGEE9v50R1WXWngAKCX+X439yby=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=Jz94D4Z/IVHFdd0ATVnXV7WTm0kDMSZk6DaX6cwPqsskSh3fXKXcWyZ5g/GMXcNd7MQuRqiUpiN8wPLzl7jhPFuNzqqQo5HNgJp5Q4RZCUQMYyJmk5rcCoqhS1ia7Jed4m6a9HADYdKSrAMA4wkik4QBUBxlHqSFKVT1sE05NJxvb24K4F41OsAnH9HESKFdD9NnTI95W+4pEhqBd1XPpU/csXbyQVS8HBqLXPJGDbAFolpIF0E11sea0Hx2raOgwp6S5TCZSAKY9VDLlHcdWfe9PgsDsF/4QArneaCMi4M1rAiahJbQzoU7fU9dTgy0BcBpN0E37ECHoNDVHnTrTQ==
X-YMail-OSG: xWF12UIVM1kwwKmRJ03KBiNcO9EDj6G15mEWz_.J5KnzvcLxvCCjkmqJi8Bkojy
 IpbfhEw7O4ysrt4C7KH1xhAmXr34NLeQXvt2Zv8NUb7ack.ubn_70WXeotm9yUtl5atOAa73A8t4
 FqAeN8W6QncOcNMXmAH22gEWnVzU0RTuYPDJsue5.yFSt6A3_aPaaFbMigBpqzdwHsrzMinXiYQf
 ayajUuGGPo7ipHnIQn8Q3gVwm.zVnEYfaGdIvRzdfCyo4eGHf9oeVZ0js.Suz07UX9kK4Vg3qpnk
 xGGtWEtFnnXHMHCP5sNqz0snWfX9tAgGsD7r9aIQd6weJKWh.oh05lx_rPsFVHNefKV5cjIrDsSA
 vrYJ7JwSaII_oOr1Mbn7mRAfhtdTAqGb1xTbnXfOVFW3bAT8XqljwxoguL6sGvRimOH.W3yjxK8A
 F.j65IV1Xbv.LZoJqT4q4rRt1VmIkzsUr9I0NrBv4WApa6No.AbW.G7KYXIc72pBOaB_IPD_VsL.
 TQAueeww_pINpg5eb85_2hcBbWyZGVNH0f0oD7mcT60bYXPOaKYVet7tezvuLewNeqLUYaSDSx1h
 xvs1CIM2Bn5Zye_XFeqQkumQ6wzDPbm4stbZLtw44.asgTJ6m3hrsN_HjXDtEhFb2UYsXYOKlW2Z
 mT8O5Csfli3IHUWqOd56ZKahK5LbV895V2bRgxZ0eAmFDmQKa2vv3bn8SHr38O2S35VsHU58hNgV
 _3ZzDI2xl1BfaEUbJEuKaLurOSINwiigUy0CIaxe6pwGIIHXhu.k1ikPB1RIZF9oqMYtG5ubjr7D
 9K8Ep1W3582adzwAeVncGlFT4x0TgUqYt3wzmskH.lCWZPBWjHkbXWDZo.zs1KGgCKsXSdoB..p3
 wZzqfCje_QQ6puq6Kf430mgsT1bysgKUHbXdjU4MxbbkelRs8nSPNF8fcCxbKPMXQYB4gEGhO8oj
 qmNu1gpvEMBxCnDPWvjgZzivwciwd5GF9ac5IEG.Y53hGbvyCQSfmwsJSgb3KWyZI4DnU.8aFjZd
 8xWzbHDG2eOtlTkLHoDomRzl.BMnNiR3pVctgtyFaa1GgmB.R530k_IVinoR7KxvAqQGk1OS5xOd
 AjjziYtpwZZgrDEtop7irmTI8wBgZegxXgEYOy_V42Y1D16N5dMLsQWghoirrfM9oFJQHj_LMsth
 aTYTs3QUh7k.9cLRGbf5dSnEcEjwuoWyV6ScLWhWsRU.Kbin7M5hN0NMYGb4OFuU5GaZ95dnYmm4
 3yF4QtqMoJg9dlng68O7SAiy2YGGugmf7zS2sshhH2hMkIzmThVcvqEeC_SibRrSJIFqcVI.L1rA
 niV0RRUzt_bskSZcLKAnvuMTyM3jbWnRBZ4dGVabaOafY.A8n3Nd9p4vDq6w1m1.k7erZRPrqovq
 j4_XW3rgadGARUS3uhW9X9E5OKHWbEvw25vVLZIlF05iD4yLLSTtEOywLFjApP1z7E7TloFRfM3y
 WKEHEdt3kQjOlLPLNMLQOwhDhTx2Cz7JS3Y9s2s0mUKMPyiKzdxZ0lYSCR7NdOlsW8zOVlzmaeIm
 dC3TEJyRzCeT31GTZvjcnaIGrlue9cQDMDk6CddL7ec0mJzeIbsQsqXxKH9aQmUGJcogQ5ezXMJZ
 DNTvE1ujamclNx6uD1ki69DcvUaiNqrpvVA.LX2Y7hLs6vEkCMo16TzTAbshXb7BEuevygVUU2Cg
 1Wqiw1iocdfEtE1W67BWJ7uMN5qyDR723T_sUZPHaMl7BSvtrwbdkga8r45ufSOuHy.t3g6uPtCO
 Gbx8VJCoQ_bzngVPEn4QyuRXrc.z5rI4etUFOT1It7HoTBhha17OM_dKCcotKxJUVgXjWzvlgssT
 qjPNTNB5sH1OVtNqR.eWGS5JsrEbYkqH7so1Fv1pQPiZlPs2A2M9VtwVdre4ldg_x_GlCqydfbBV
 X3cIdwIORJMz1tFuuJ0mVH8NEsSci45NU0jDGcgWhcKNX7_QLZ1ctZIyJ9rRenENVZ6Ra_iqLs7y
 AMpfJFxc7VXTpcUwQ8T8PyM0MY_znDwJmDU5uuUCXkiY7sm37ZHtCLPULz3Np0aWaA62MGrCpGz6
 wLOMqGe5vdkeEfoET8s3xHTYRLRcjZeNhOhO3u3u2xmRjwdV0NGvptMkhbFIBr.IPrjtlPFiVj2D
 oPkyWdHe61qzto8AdQAPE0U.HGzCdIkvl7OLysIzNY1f_QLjpdNeWcHk7Dz1qLsr0DYWXpvY98Yd
 AEjtus2T3YZj9XlL9._w3kbuMgmo4bkr8n_GBL1uXK2PL6Um7aQH5IAT6Y7rtpq5neIeN84ZH0Wf
 H
X-Sonic-MF: <marklmi@yahoo.com>
X-Sonic-ID: c6173247-b1e1-49d0-9876-4ca3b53cd9b0
Received: from sonic.gate.mail.ne1.yahoo.com by sonic307.consmr.mail.gq1.yahoo.com with HTTP; Tue, 4 Jul 2023 21:22:33 +0000
Received: by hermes--production-gq1-5748b5bccb-wqjgs (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 6da0e11086878af3e0dda2cad82fd6ad;
          Tue, 04 Jul 2023 21:22:29 +0000 (UTC)
Content-Type: text/plain;
	charset=us-ascii
List-Id: Porting FreeBSD to ARM processors <freebsd-arm.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-arm
List-Help: <mailto:freebsd-arm+help@freebsd.org>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Subscribe: <mailto:freebsd-arm+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-arm+unsubscribe@freebsd.org>
Sender: owner-freebsd-arm@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\))
Subject: Re: More swap trouble with armv7, was Re: -current on armv7 stuck
 with flashing disk light
From: Mark Millard <marklmi@yahoo.com>
In-Reply-To: <ZKRt4ryCGyv9n+Q/@www.zefox.net>
Date: Tue, 4 Jul 2023 14:22:18 -0700
Cc: freebsd-ports@freebsd.org,
 freebsd-arm@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <9A15D619-3274-44AC-B7E1-A1D6C7D334F2@yahoo.com>
References: <ZJpFqAnnKPq/XmxJ@www.zefox.net>
 <A91FF89C-2BAA-4E93-96FA-C75C6FA4A0A0@yahoo.com>
 <ZJsOTzp+b7O2+bhQ@www.zefox.net>
 <E1670A16-2F8E-4E94-A44C-DF7886233F62@yahoo.com>
 <066FD282-1637-448C-99FF-BA62718386F0@yahoo.com>
 <ZJsZiQGs0QlHhzTV@www.zefox.net> <ZKRt4ryCGyv9n+Q/@www.zefox.net>
To: bob prohaska <fbsd@www.zefox.net>
X-Mailer: Apple Mail (2.3731.600.7)
X-Rspamd-Queue-Id: 4QwbNW6XR7z3x4V
X-Spamd-Bar: ----
X-Spamd-Result: default: False [-4.00 / 15.00];
	REPLY(-4.00)[];
	ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]
X-Rspamd-Pre-Result: action=no action;
	module=replies;
	Message is reply to one we originated
X-ThisMailContainsUnwantedMimeParts: N

On Jul 4, 2023, at 12:07, bob prohaska <fbsd@www.zefox.net> wrote:

> On Tue, Jun 27, 2023 at 10:16:57AM -0700, bob prohaska wrote:
>> On Tue, Jun 27, 2023 at 09:59:40AM -0700, Mark Millard wrote:
>>>>=20
>>>> If you want to identify system hangs, please
>>>> put back:
>>>>=20
>>>> vm.swap_enabled=3D0
>>>> vm.swap_idle_enabled=3D0
>>>>=20
>>=20
>> They're reinstated now, but I don't want to disturb the system
>> while it seems to be building world acceptably.=20
>>=20
> Reinstating=20
> vm.swap_enabled=3D0
> vm.swap_idle_enabled=3D0
>=20
> and limiting buildworld to -j3 allows buildworld to complete =
successfully in 1 GB of swap.
>=20
> Meanwhile, attempts to compile sysutils/usbtop using poudriere still =
cause swap exhaustion
> while compiling /devel/llvm15 even with 2 GB of swap allocated.=20

What sort of parallelism settings in poudriere for the
devel/llvm15 build attempt? Have you tried allowing
less parallelism (if there is a less for what you have
tried)?

What options are enabled vs. disabled for devel/llvm15 ?

BE_STANDARD vs. BE_FREEBSD vs. BE_NATIVE ?

BE_NATIVE probably help limit resource use the most if it
happens to be sufficient. BE_FREEBSD would be in the
middle of the 3 options for this issue.

Is MLIR enabled? If having it disabled is sufficient, it
being disabled should help avoid as much resource use.
Simiarly for FLANG. (Building FLANG requires MLIR, so
having MLIR disabled implies FLANG needing to also be
disabled.)

> The messages are
> Jul  4 11:18:48 www kernel: pid 1074 (getty), jid 0, uid 0, was =
killed: out of swap space

In my view the "out of swap space" is still a misleading
misnomer for this context, but at least the following
messages are more specific to the actual internal
data-structure(s) problem(s). My understanding is that
the data structures can have fragmentation issues.

For fragmentation issues, prior history since booting
might contribute, and building just after a reboot may
end up with less fragmentation. (Unknown if sufficiently
less.)

Also, over allocating the swap partition (by not having
kern.maxswzone appropriately matching) likely makes
"swap blk zone exhausted" more likely. It is one of the
reasons I avoid using swap partitioning with a total
size that generates the message about possible
mistuning.

> swap blk zone exhausted, increase kern.maxswzone

Have you ever gotten the above line before? I was
unaware of any examples of it showing up.

> swblk zone ok

I'll note that there is another potential message
pair for "swap pctrie zone exhausted"/"swpctrie zone ok"
that you have not reported getting.

Have you ever seen the "swap pctrie zone exhausted"
notice? (Just curiosity on my part.)

> IIRC the "increase kern.maxswzone" is unhelpful, if not impossible. =
The
> "swblk zone ok" seems new.=20

Are you using the default kern.maxswzone for your context?
What is its value?

Did you get the notice about possible mistuning for your
combination of swap partition sizing and kern.maxswzone
value? Or did "swap blk zone" happen even without that
notice happening?

> =46rom the gstat output near peak swap use the system wasn't I/O =
bound,

The "swap blk zone" contains an in-kernel-RAM data
structure that is involved in managing the swap space
usage.

> the disk was less than 25% busy at the time of the first OOMA kill.

"swap blk zone" can end up with fragmentation issues, where
the total available is only made up of a bunch of tiny chunks
and nothing large can be handled as a unit any more. (A general
description of "fragmented".)

> Eventually it was possible to log in on the serial console and run =
top:
>=20
> 33 processes:  1 running, 29 sleeping, 3 zombie
> CPU:  0.0% user,  0.0% nice, 10.6% system,  0.2% interrupt, 89.2% idle
> Mem: 139M Active, 8256K Inact, 252M Laundry, 221M Wired, 98M Buf, 292M =
Free
> Swap: 2048M Total, 1291M Used, 756M Free, 63% Inuse
>=20
>  PID   JID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    =
WCPU COMMAND
> 40719     0 root          1  20  -20     0B  8192B swzonx   0   0:12   =
9.15% cron
> 40717     0 root          1  20  -20     0B  8192B swzonx   0   0:34   =
9.08% sh
> 40709     0 root          1  20  -20     0B  8192B swzonx   0   0:38   =
9.01% sshd
> 40720     0 root          1  20  -20     0B  8192B swzonx   3   0:13   =
7.47% sh

Unfortunately the swzonx text is truncated. There is
actually:

pause("swzonxb", 10); for swblk zone
and:
pause("swzonxp", 10); for swap pctrie zone

top's display leaves it unclear which was involved.

> 40721     0 bob           1  20    0  6608K  2600K CPU1     1   0:00   =
0.32% top
> 25761     0 bob           1  20    0    14M  6136K select   0   0:02   =
0.03% sshd
> 25852     0 root          1  20    0  4668K  1648K ttyin    1   0:01   =
0.03% tip
> 1237     0 root          1  20    0  5820K  1540K wait     1   0:12   =
0.00% sh
> 25381     0 root          1  23    0    14M  5868K select   1   0:01   =
0.00% sshd
> 1030     0 root          1  24    0    13M  2416K vmbckw   1   0:00   =
0.00% sshd
> 12715     0 root          1  68    0  5820K  1660K wait     0   0:00   =
0.00% sh
> 12710     0 root          1  20    0  5820K  1556K piperd   1   0:00   =
0.00% sh
>  929     0 root          1  20    0  5356K  1256K select   3   0:00   =
0.00% syslogd
> 1014     0 root          1  20    0  5124K  1356K nanslp   2   0:00   =
0.00% cron
> 25770     0 bob           1  36    0  6844K  3116K pause    1   0:00   =
0.00% tcsh
> 25794     0 bob           1  24    0  5380K  2188K wait     2   0:00   =
0.00% su
> 39626     0 root          1  20    0  5424K  2404K wait     2   0:00   =
0.00% login
> 40635     0 bob           1  20    0  6824K  3272K pause    1   0:00   =
0.00% tcsh
> 25820     0 root          1  21    0  5608K  2204K wait     0   0:00   =
0.00% sh
> 25851     0 root          1  20    0  4668K  1656K ttyin    3   0:00   =
0.00% tip
> 40454     0 root          1  24    0  4636K  1780K ttyin    3   0:00   =
0.00% getty
>=20
> I'll let it go for a while to see if poudriere notices it's failed and =
cleans up.
>=20
> At the moment /boot/loader.conf contains
>=20
> # Configure USB OTG; see usb_template(4).
> hw.usb.template=3D3
> umodem_load=3D"YES"
> # Disable the beastie menu and color
> beastie_disable=3D"YES"
> loader_color=3D"NO"
> vm.pageout_oom_seq=3D"4096"
> vm.pfault_oom_attempts=3D"3"
> vm.pfault_oom_attempts=3D"120"

2 assignments to the same thing in a row?
The 2nd ends up controlling the value.

> vm.pfault_oom_wait=3D"20"

So you are allowing it 120 * 20 sec =3D=3D 2400 sec
(in other words, 40 minutes of retrying every 20
seconds) to handle a page fault.

That time scale may have contributed to why it
failed first for "swap blk zone exhausted"
instead of more usual types of OOM cause:
How many page faults had active 40 minute
intervals at the time?

You may be just moving around where a problem
shows up, not leading to lack of a failure
overall.

> kern.cam.boot_delay=3D"20000"
> vfs.ffs.dotrimcons=3D"1"
> vfs.root_mount_always_wait=3D"1"
> filemon_load=3D"YES"
>=20
> /usr/local/etc/poudriere.conf contains
> USE_TMPFS=3Dno
> NOHANG_TIME=3D28800
> MAX_EXECUTION_TIME_EXTRACT=3D14400
> MAX_EXECUTION_TIME_INSTALL=3D14400
> MAX_EXECUTION_TIME_PACKAGE=3D432000
> ALLOW_MAKE_JOBS=3Dyes
> MAX_JOBS_NUMBER=3D2

I do not remember there being a MAX_JOBS_NUMBER in
the infrastructure. So I will ignore that line. It
probably should be deleted.

> MAKE_JOBS_NUMBER=3D2
>=20
> Do these settings look reasonable?

ALLOW_MAKE_JOBS/MAX_JOBS_NUMBER is not independent
of what is being built. There is no global, single
answer to "looks reasonable" for them.

However, MAX_JOBS_NUMBER is in the wrong file.
It is from/for make, not from/for poudriere
directly. (But there is a way for poudriere
to contribute such to make.)

For example (from a grep):

/usr/local/etc/poudriere.d/make.conf:MAKE_JOBS_NUMBER=3D2

( MAKE_JOBS_NUMBER_LIMIT is the same for where it
goes. )

You might need to use MAX_JOBS_NUMBER=3D1 or
to not assign to ALLOW_MAKE_JOBS to have a
chance to have the devel/llvm15 build fit
if you have already turned off options that
avoid using resources for building what you
do not need.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com