From nobody Tue Jul 4 21:22:18 2023 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QwbNX1ns6z4lWV1 for ; Tue, 4 Jul 2023 21:22:36 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic307-55.consmr.mail.gq1.yahoo.com (sonic307-55.consmr.mail.gq1.yahoo.com [98.137.64.31]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4QwbNW6XR7z3x4V for ; Tue, 4 Jul 2023 21:22:35 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1688505753; bh=Z+5xkZpxurHfsbB1vCzYGoZEIFncfTWJ15XKkXbQ724=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=aV8Hj3M63cMqG9HyRBgG3VcQLKxJDUaSlIBbXck/MH5SRux47LhVWfqTDXdCVN8RccJhgTCs7S+OA+0YHtrvHzd2lbnWixg8+hAd5pOFxafUOI2tYpzxc9F/YcTqmZXV8wE+NK7S6mcwV1WWPsgI9hIkCs57mV0fS0XFJShw6dNHnvdXReWya9UHjMxJBJZG/1a75VsHeeSeNdcWRKC71FmS8T/LfBvcQd2vDxo5VF57TuBySkyL+yCgjFwKw6wdfbWVAJoYPrYzGt2sA1qZbK5EiHLgeIJuse6nn6P3dijGQpXLY20n+ffe8Scz6RmJ+oFL3P2K++HWjaKMKjqsDg== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1688505753; bh=ljeaS/7mwE+9I/xqvGEE9v50R1WXWngAKCX+X439yby=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=Jz94D4Z/IVHFdd0ATVnXV7WTm0kDMSZk6DaX6cwPqsskSh3fXKXcWyZ5g/GMXcNd7MQuRqiUpiN8wPLzl7jhPFuNzqqQo5HNgJp5Q4RZCUQMYyJmk5rcCoqhS1ia7Jed4m6a9HADYdKSrAMA4wkik4QBUBxlHqSFKVT1sE05NJxvb24K4F41OsAnH9HESKFdD9NnTI95W+4pEhqBd1XPpU/csXbyQVS8HBqLXPJGDbAFolpIF0E11sea0Hx2raOgwp6S5TCZSAKY9VDLlHcdWfe9PgsDsF/4QArneaCMi4M1rAiahJbQzoU7fU9dTgy0BcBpN0E37ECHoNDVHnTrTQ== X-YMail-OSG: xWF12UIVM1kwwKmRJ03KBiNcO9EDj6G15mEWz_.J5KnzvcLxvCCjkmqJi8Bkojy IpbfhEw7O4ysrt4C7KH1xhAmXr34NLeQXvt2Zv8NUb7ack.ubn_70WXeotm9yUtl5atOAa73A8t4 FqAeN8W6QncOcNMXmAH22gEWnVzU0RTuYPDJsue5.yFSt6A3_aPaaFbMigBpqzdwHsrzMinXiYQf ayajUuGGPo7ipHnIQn8Q3gVwm.zVnEYfaGdIvRzdfCyo4eGHf9oeVZ0js.Suz07UX9kK4Vg3qpnk xGGtWEtFnnXHMHCP5sNqz0snWfX9tAgGsD7r9aIQd6weJKWh.oh05lx_rPsFVHNefKV5cjIrDsSA vrYJ7JwSaII_oOr1Mbn7mRAfhtdTAqGb1xTbnXfOVFW3bAT8XqljwxoguL6sGvRimOH.W3yjxK8A F.j65IV1Xbv.LZoJqT4q4rRt1VmIkzsUr9I0NrBv4WApa6No.AbW.G7KYXIc72pBOaB_IPD_VsL. TQAueeww_pINpg5eb85_2hcBbWyZGVNH0f0oD7mcT60bYXPOaKYVet7tezvuLewNeqLUYaSDSx1h xvs1CIM2Bn5Zye_XFeqQkumQ6wzDPbm4stbZLtw44.asgTJ6m3hrsN_HjXDtEhFb2UYsXYOKlW2Z mT8O5Csfli3IHUWqOd56ZKahK5LbV895V2bRgxZ0eAmFDmQKa2vv3bn8SHr38O2S35VsHU58hNgV _3ZzDI2xl1BfaEUbJEuKaLurOSINwiigUy0CIaxe6pwGIIHXhu.k1ikPB1RIZF9oqMYtG5ubjr7D 9K8Ep1W3582adzwAeVncGlFT4x0TgUqYt3wzmskH.lCWZPBWjHkbXWDZo.zs1KGgCKsXSdoB..p3 wZzqfCje_QQ6puq6Kf430mgsT1bysgKUHbXdjU4MxbbkelRs8nSPNF8fcCxbKPMXQYB4gEGhO8oj qmNu1gpvEMBxCnDPWvjgZzivwciwd5GF9ac5IEG.Y53hGbvyCQSfmwsJSgb3KWyZI4DnU.8aFjZd 8xWzbHDG2eOtlTkLHoDomRzl.BMnNiR3pVctgtyFaa1GgmB.R530k_IVinoR7KxvAqQGk1OS5xOd AjjziYtpwZZgrDEtop7irmTI8wBgZegxXgEYOy_V42Y1D16N5dMLsQWghoirrfM9oFJQHj_LMsth aTYTs3QUh7k.9cLRGbf5dSnEcEjwuoWyV6ScLWhWsRU.Kbin7M5hN0NMYGb4OFuU5GaZ95dnYmm4 3yF4QtqMoJg9dlng68O7SAiy2YGGugmf7zS2sshhH2hMkIzmThVcvqEeC_SibRrSJIFqcVI.L1rA niV0RRUzt_bskSZcLKAnvuMTyM3jbWnRBZ4dGVabaOafY.A8n3Nd9p4vDq6w1m1.k7erZRPrqovq j4_XW3rgadGARUS3uhW9X9E5OKHWbEvw25vVLZIlF05iD4yLLSTtEOywLFjApP1z7E7TloFRfM3y WKEHEdt3kQjOlLPLNMLQOwhDhTx2Cz7JS3Y9s2s0mUKMPyiKzdxZ0lYSCR7NdOlsW8zOVlzmaeIm dC3TEJyRzCeT31GTZvjcnaIGrlue9cQDMDk6CddL7ec0mJzeIbsQsqXxKH9aQmUGJcogQ5ezXMJZ DNTvE1ujamclNx6uD1ki69DcvUaiNqrpvVA.LX2Y7hLs6vEkCMo16TzTAbshXb7BEuevygVUU2Cg 1Wqiw1iocdfEtE1W67BWJ7uMN5qyDR723T_sUZPHaMl7BSvtrwbdkga8r45ufSOuHy.t3g6uPtCO Gbx8VJCoQ_bzngVPEn4QyuRXrc.z5rI4etUFOT1It7HoTBhha17OM_dKCcotKxJUVgXjWzvlgssT qjPNTNB5sH1OVtNqR.eWGS5JsrEbYkqH7so1Fv1pQPiZlPs2A2M9VtwVdre4ldg_x_GlCqydfbBV X3cIdwIORJMz1tFuuJ0mVH8NEsSci45NU0jDGcgWhcKNX7_QLZ1ctZIyJ9rRenENVZ6Ra_iqLs7y AMpfJFxc7VXTpcUwQ8T8PyM0MY_znDwJmDU5uuUCXkiY7sm37ZHtCLPULz3Np0aWaA62MGrCpGz6 wLOMqGe5vdkeEfoET8s3xHTYRLRcjZeNhOhO3u3u2xmRjwdV0NGvptMkhbFIBr.IPrjtlPFiVj2D oPkyWdHe61qzto8AdQAPE0U.HGzCdIkvl7OLysIzNY1f_QLjpdNeWcHk7Dz1qLsr0DYWXpvY98Yd AEjtus2T3YZj9XlL9._w3kbuMgmo4bkr8n_GBL1uXK2PL6Um7aQH5IAT6Y7rtpq5neIeN84ZH0Wf H X-Sonic-MF: X-Sonic-ID: c6173247-b1e1-49d0-9876-4ca3b53cd9b0 Received: from sonic.gate.mail.ne1.yahoo.com by sonic307.consmr.mail.gq1.yahoo.com with HTTP; Tue, 4 Jul 2023 21:22:33 +0000 Received: by hermes--production-gq1-5748b5bccb-wqjgs (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 6da0e11086878af3e0dda2cad82fd6ad; Tue, 04 Jul 2023 21:22:29 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Subject: Re: More swap trouble with armv7, was Re: -current on armv7 stuck with flashing disk light From: Mark Millard In-Reply-To: Date: Tue, 4 Jul 2023 14:22:18 -0700 Cc: freebsd-ports@freebsd.org, freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <9A15D619-3274-44AC-B7E1-A1D6C7D334F2@yahoo.com> References: <066FD282-1637-448C-99FF-BA62718386F0@yahoo.com> To: bob prohaska X-Mailer: Apple Mail (2.3731.600.7) X-Rspamd-Queue-Id: 4QwbNW6XR7z3x4V X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On Jul 4, 2023, at 12:07, bob prohaska wrote: > On Tue, Jun 27, 2023 at 10:16:57AM -0700, bob prohaska wrote: >> On Tue, Jun 27, 2023 at 09:59:40AM -0700, Mark Millard wrote: >>>>=20 >>>> If you want to identify system hangs, please >>>> put back: >>>>=20 >>>> vm.swap_enabled=3D0 >>>> vm.swap_idle_enabled=3D0 >>>>=20 >>=20 >> They're reinstated now, but I don't want to disturb the system >> while it seems to be building world acceptably.=20 >>=20 > Reinstating=20 > vm.swap_enabled=3D0 > vm.swap_idle_enabled=3D0 >=20 > and limiting buildworld to -j3 allows buildworld to complete = successfully in 1 GB of swap. >=20 > Meanwhile, attempts to compile sysutils/usbtop using poudriere still = cause swap exhaustion > while compiling /devel/llvm15 even with 2 GB of swap allocated.=20 What sort of parallelism settings in poudriere for the devel/llvm15 build attempt? Have you tried allowing less parallelism (if there is a less for what you have tried)? What options are enabled vs. disabled for devel/llvm15 ? BE_STANDARD vs. BE_FREEBSD vs. BE_NATIVE ? BE_NATIVE probably help limit resource use the most if it happens to be sufficient. BE_FREEBSD would be in the middle of the 3 options for this issue. Is MLIR enabled? If having it disabled is sufficient, it being disabled should help avoid as much resource use. Simiarly for FLANG. (Building FLANG requires MLIR, so having MLIR disabled implies FLANG needing to also be disabled.) > The messages are > Jul 4 11:18:48 www kernel: pid 1074 (getty), jid 0, uid 0, was = killed: out of swap space In my view the "out of swap space" is still a misleading misnomer for this context, but at least the following messages are more specific to the actual internal data-structure(s) problem(s). My understanding is that the data structures can have fragmentation issues. For fragmentation issues, prior history since booting might contribute, and building just after a reboot may end up with less fragmentation. (Unknown if sufficiently less.) Also, over allocating the swap partition (by not having kern.maxswzone appropriately matching) likely makes "swap blk zone exhausted" more likely. It is one of the reasons I avoid using swap partitioning with a total size that generates the message about possible mistuning. > swap blk zone exhausted, increase kern.maxswzone Have you ever gotten the above line before? I was unaware of any examples of it showing up. > swblk zone ok I'll note that there is another potential message pair for "swap pctrie zone exhausted"/"swpctrie zone ok" that you have not reported getting. Have you ever seen the "swap pctrie zone exhausted" notice? (Just curiosity on my part.) > IIRC the "increase kern.maxswzone" is unhelpful, if not impossible. = The > "swblk zone ok" seems new.=20 Are you using the default kern.maxswzone for your context? What is its value? Did you get the notice about possible mistuning for your combination of swap partition sizing and kern.maxswzone value? Or did "swap blk zone" happen even without that notice happening? > =46rom the gstat output near peak swap use the system wasn't I/O = bound, The "swap blk zone" contains an in-kernel-RAM data structure that is involved in managing the swap space usage. > the disk was less than 25% busy at the time of the first OOMA kill. "swap blk zone" can end up with fragmentation issues, where the total available is only made up of a bunch of tiny chunks and nothing large can be handled as a unit any more. (A general description of "fragmented".) > Eventually it was possible to log in on the serial console and run = top: >=20 > 33 processes: 1 running, 29 sleeping, 3 zombie > CPU: 0.0% user, 0.0% nice, 10.6% system, 0.2% interrupt, 89.2% idle > Mem: 139M Active, 8256K Inact, 252M Laundry, 221M Wired, 98M Buf, 292M = Free > Swap: 2048M Total, 1291M Used, 756M Free, 63% Inuse >=20 > PID JID USERNAME THR PRI NICE SIZE RES STATE C TIME = WCPU COMMAND > 40719 0 root 1 20 -20 0B 8192B swzonx 0 0:12 = 9.15% cron > 40717 0 root 1 20 -20 0B 8192B swzonx 0 0:34 = 9.08% sh > 40709 0 root 1 20 -20 0B 8192B swzonx 0 0:38 = 9.01% sshd > 40720 0 root 1 20 -20 0B 8192B swzonx 3 0:13 = 7.47% sh Unfortunately the swzonx text is truncated. There is actually: pause("swzonxb", 10); for swblk zone and: pause("swzonxp", 10); for swap pctrie zone top's display leaves it unclear which was involved. > 40721 0 bob 1 20 0 6608K 2600K CPU1 1 0:00 = 0.32% top > 25761 0 bob 1 20 0 14M 6136K select 0 0:02 = 0.03% sshd > 25852 0 root 1 20 0 4668K 1648K ttyin 1 0:01 = 0.03% tip > 1237 0 root 1 20 0 5820K 1540K wait 1 0:12 = 0.00% sh > 25381 0 root 1 23 0 14M 5868K select 1 0:01 = 0.00% sshd > 1030 0 root 1 24 0 13M 2416K vmbckw 1 0:00 = 0.00% sshd > 12715 0 root 1 68 0 5820K 1660K wait 0 0:00 = 0.00% sh > 12710 0 root 1 20 0 5820K 1556K piperd 1 0:00 = 0.00% sh > 929 0 root 1 20 0 5356K 1256K select 3 0:00 = 0.00% syslogd > 1014 0 root 1 20 0 5124K 1356K nanslp 2 0:00 = 0.00% cron > 25770 0 bob 1 36 0 6844K 3116K pause 1 0:00 = 0.00% tcsh > 25794 0 bob 1 24 0 5380K 2188K wait 2 0:00 = 0.00% su > 39626 0 root 1 20 0 5424K 2404K wait 2 0:00 = 0.00% login > 40635 0 bob 1 20 0 6824K 3272K pause 1 0:00 = 0.00% tcsh > 25820 0 root 1 21 0 5608K 2204K wait 0 0:00 = 0.00% sh > 25851 0 root 1 20 0 4668K 1656K ttyin 3 0:00 = 0.00% tip > 40454 0 root 1 24 0 4636K 1780K ttyin 3 0:00 = 0.00% getty >=20 > I'll let it go for a while to see if poudriere notices it's failed and = cleans up. >=20 > At the moment /boot/loader.conf contains >=20 > # Configure USB OTG; see usb_template(4). > hw.usb.template=3D3 > umodem_load=3D"YES" > # Disable the beastie menu and color > beastie_disable=3D"YES" > loader_color=3D"NO" > vm.pageout_oom_seq=3D"4096" > vm.pfault_oom_attempts=3D"3" > vm.pfault_oom_attempts=3D"120" 2 assignments to the same thing in a row? The 2nd ends up controlling the value. > vm.pfault_oom_wait=3D"20" So you are allowing it 120 * 20 sec =3D=3D 2400 sec (in other words, 40 minutes of retrying every 20 seconds) to handle a page fault. That time scale may have contributed to why it failed first for "swap blk zone exhausted" instead of more usual types of OOM cause: How many page faults had active 40 minute intervals at the time? You may be just moving around where a problem shows up, not leading to lack of a failure overall. > kern.cam.boot_delay=3D"20000" > vfs.ffs.dotrimcons=3D"1" > vfs.root_mount_always_wait=3D"1" > filemon_load=3D"YES" >=20 > /usr/local/etc/poudriere.conf contains > USE_TMPFS=3Dno > NOHANG_TIME=3D28800 > MAX_EXECUTION_TIME_EXTRACT=3D14400 > MAX_EXECUTION_TIME_INSTALL=3D14400 > MAX_EXECUTION_TIME_PACKAGE=3D432000 > ALLOW_MAKE_JOBS=3Dyes > MAX_JOBS_NUMBER=3D2 I do not remember there being a MAX_JOBS_NUMBER in the infrastructure. So I will ignore that line. It probably should be deleted. > MAKE_JOBS_NUMBER=3D2 >=20 > Do these settings look reasonable? ALLOW_MAKE_JOBS/MAX_JOBS_NUMBER is not independent of what is being built. There is no global, single answer to "looks reasonable" for them. However, MAX_JOBS_NUMBER is in the wrong file. It is from/for make, not from/for poudriere directly. (But there is a way for poudriere to contribute such to make.) For example (from a grep): /usr/local/etc/poudriere.d/make.conf:MAKE_JOBS_NUMBER=3D2 ( MAKE_JOBS_NUMBER_LIMIT is the same for where it goes. ) You might need to use MAX_JOBS_NUMBER=3D1 or to not assign to ALLOW_MAKE_JOBS to have a chance to have the devel/llvm15 build fit if you have already turned off options that avoid using resources for building what you do not need. =3D=3D=3D Mark Millard marklmi at yahoo.com