From owner-freebsd-arm@freebsd.org Mon Jan 18 03:19:40 2021 Return-Path: Delivered-To: freebsd-arm@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 130E04E061D for ; Mon, 18 Jan 2021 03:19:40 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic314-21.consmr.mail.gq1.yahoo.com (sonic314-21.consmr.mail.gq1.yahoo.com [98.137.69.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4DJxpy6VrLz4rLd for ; Mon, 18 Jan 2021 03:19:38 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1610939976; bh=mMtz4Pmk/Q13cf2An/iYxt93mJQsuApiWkqsj1E+/oQ=; h=Subject:From:Date:To:From:Subject:Reply-To; b=MULPrbu3QAFfg7UKvW626PrjoraGm+VfW77T2LWjPbulgaq78xlEQLy+kyvp01vXMIRuVkJZ2LP1kEmF3BWMnKR50Hq79g0iuATZfaom6SZBCEU48uAYsTa2LLpPn8C53TstsC8eQKi7mPAPqh0itQsETOTvc2S6QS9987Ehvquf6Ud+B+BKvwh4Mz/X6sMK4Vkx0X5im/G9iYHRo9djfNDogpqcvO9wLsWSTQxXxjzXYi2GNoVwDP7G5iaKAVKRQJlwzOsuCg59/j8eHJNTnN8+fXYMSerth/ERfhOB7TWsa6DvbsCdOTKhJfFjcnro2yYFPTu+wU1LICnD0MsRow== X-YMail-OSG: lERwqYgVM1n7RAbmxdOGMztN2sTDsgg6Iy7CrRLtRBnJzKGZ9zYcUuIMx5csGwJ QscIuBHaUvd68oUHGleaz0p4yf122CIbz9ACoGrwAY8kqz6V5wrtRtXA0CaHA_r.TXxtZcmMGpIS Ih.28jKhYy6kAUMmnUm2ye4dxuzqpwhQaH3bUGe3TDYhBOVxml84AdlWOuO5rhvWn9xiztVuxmD0 CRYUQv5VCSP3hvmmiW.K.A68xGbiwMwpnXnxBa36BLZInBzsvesY.p8djmaezzfrO8YLGSSmqiaG 70bi5cUzhB_PALBwGKIGIKGsJy8JtYyus7ldkFrNz.WidJxrhuzyNPejssxu_3P_7vWu5z9zDCqD JLSOaLtuLP9aaQRQMkhzFj256hF_JaMvbV0VRGeiT279AmqWLJFhHCMclbItuxFXlnd8RjRtPXRd NCZ.5uvfdfrN4lYT4kZEMiPTlxD8JIX73q1IAj7PSvw_LEcRKfz_X.HU7qLyCafjSTcn4aboeOdv bSC73K4o1slcpty_ZIGRjrZGBUofPp9QLwy7JY_cYdXLXXn7i.NngwaWmPUu3qQopb0oMPqu26BR 6OyUxEoVqYPA0LDHpF4KylhiqLY999DMMyykxJzx221OmFukWdjBg8HQc9yTrATzj6.SiqBZ0T4D Te9O5wgeod4zgwkNMiRjNy4FB.bAHAAlBOtNefVb78CbMVEtb3GNUMlYihOmKJ0JVR.uUy0N99_D 33XfbLWiUJuCom.NM3RmnFwccg6HNgC.gYObe.gMt_Dil..Ll6uQxud8JJGQY8c9y0W7L12WO37s dY9lQdSnL9p8WkeJSmTcULUEgv3z0U8.ZNd8B8noYyLafmeUmHJMVd0_LvVThcygwgq_iJM0PFiw L.4xjm5CZMQ8PlZtn4pXGYyEN9BS2O5IW8V0ZIpyQXTqxM2tgxIbgm_yCQoVNO1wEB8z4p_4KXHs zCwVwKGiMi_1.9HOp34DbJVe1zyr3LXfyi8A1bQnt5b6.r2ks91zvZC6ozJ2HaskvZu98TTQ47Ok 2oOejzhmRCXo9BW2ZN_TFQqrEKZbWtDzb5ufEYjr4iohgr5fPfcOc5_tOwTLvjfnR5o529x1EGO_ kaA6KD4NKkviUXlmJWkf3VJ7QoAysBjshd2bhmXLGinbsXbq49SYsPuyEH8pb0h3DS65m1voe7Af Odrcj3s3wUip6fTbplOlCcIO3y7AOf2nGgDz00OZLCC09Se6Pl6SzM8o_ylzgbP19lBNzdZ8aeua sGq2mdy8wJYY6bMT3rWK9ir1rUAj6CLtqbUNr7zBIT614BAAKhcbs.NihcXpw9jxkuLewv7K2zHz o0.Y27kUAk8CnWMRX105AIlM9fk9BHOFoVc1DR1gOfv5.J2habCbTBxLhRTC42M1XKAYvH4iKQHD j945vEjQzgPH.DjMjqsxTOMj_LQrQFqyaNsdPA7eCIjyeko44QDl7AgOICj7upG87xb5pNNCxifp 4k9u5n8gFeogNdK7tY36dPokKXskpEc2YgE.8ZBNYbDdRn_PLGC4AzAPVRHs_nJGFDGXVkjtsayL pkqiqyI.8BXeQTqmvCA31ivhIKd0zkVOxC17RKmBQHPUIuo_ehTSYB_FQnb4aI8CRR77Iw2fRfsT ax_JvDcgjiYE_VcieYTBIgwZ7WWF5I.LrQHVSkTWFKhWU5SCAid_UY8J18TsTsPiAx7rQY6G_9ir iz8nHx9asiIFnFjCPRq7dMHLeet1UJFNYVEVkDRa5cqOThm8zexxt8Io9wajUCloNChUqeoE.5Rq Ym0gzqOxyDMYHWdaikWm7PZT4qakGoBtOCm1gqdqfvaGpsD.J3LNyZ3xZW23qPpAxIxxvbIq0gRd IbTD.Dms3MgFtg7yzY0Yyflci_EDprzCVzrArciD9VvclRw8p3b1IvU24M8W6iuyZI6WGWg0IF.o U5uitQJsaQudloz2TdJ9.KANEuFj6arT_iRxV84tn6fgYINXgZUi9ms29xE6Y3asq3gFMA1MhCDh yEPQtJ93rhvcqZHu3nQ.koGZLnFQbTFG6y_hhVj0XcTHprqZdjhNp_1E6O5IBNviCtrGpdJWMqoH Zg4O32qPLDlhmkZzjesuP36ulHrPVuKqBnQ28sulJF0IzbqjpqTRZa7lAEFGSht.WMHnLzmshooZ xm3vPqLVgXeAKXoK6JvaqHILrLjB6yuhyfyr8w4b8MqbmbYAKChSHqKg.zRVlBRnZO8YY_nVznP8 Q7PrXV7QtVc8P8mc85tSK3o82wQwtW4NEEnO12LyK2Ic_fcNZQtWl4RO6VvCgi1knmoQ90OsZqOD HXYOePVRIv3bZCM6Q7zvdn2iWntqLvOF.Sf.RIx0Hn_0coZFMxoJdlEGXhGVdHR2Auckkuvpgz9a N_oJwGDZFPW3X_OwNZ4O15bzlrX1EJF7fZhZ6PrZPQOZoQTkPs7mPNxoKXDDQfv9KNQc5tFXl0Wf F8_AiDnOElik_lTHKVL2a8Ayg9MCxIPUXjUdnnnWQn4ca8uD.10MpGpR20V0rIYPxS15eSDRC0UO 2m5EhhzT8D3aPOLyjuNQ4F6M8nsu.Vm5ocf0NYjB8Idn1YURzEgUfoz4b.iXGtXyw3eZVbo.j0ov J_jU- Received: from sonic.gate.mail.ne1.yahoo.com by sonic314.consmr.mail.gq1.yahoo.com with HTTP; Mon, 18 Jan 2021 03:19:36 +0000 Received: by smtp403.mail.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID b43a1661d3f58183e7d34af0db0876b0; Mon, 18 Jan 2021 03:19:34 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.40.0.2.32\)) Subject: Re: Silent hang in buildworld, was Re: Invoking -v for clang during buildworld From: Mark Millard In-Reply-To: <20210118015009.GA31353@www.zefox.net> Date: Sun, 17 Jan 2021 19:19:33 -0800 Cc: Current FreeBSD , freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <60CCCDE8-E3D3-4920-9FC0-A945330F6830@yahoo.com> References: <20210116043740.GA19523@www.zefox.net> <20210116155538.GA24259@www.zefox.net> <20210116220334.GA26756@www.zefox.net> <20210117174006.GA30728@www.zefox.net> <85889EAE-F579-4220-9185-944D9AA5075A@yahoo.com> <20210118015009.GA31353@www.zefox.net> To: bob prohaska X-Mailer: Apple Mail (2.3654.40.0.2.32) X-Rspamd-Queue-Id: 4DJxpy6VrLz4rLd X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.50 / 15.00]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RBL_DBL_DONT_QUERY_IPS(0.00)[98.137.69.84:from]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; SPAMHAUS_ZRD(0.00)[98.137.69.84:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.69.84:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.69.84:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-arm] X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Jan 2021 03:19:40 -0000 On 2021-Jan-17, at 17:50, bob prohaska wrote: > On Sun, Jan 17, 2021 at 12:30:51PM -0800, Mark Millard wrote: >>=20 >>=20 >> On 2021-Jan-17, at 09:40, bob prohaska wrote: >>=20 >>> On Sat, Jan 16, 2021 at 03:04:04PM -0800, Mark Millard wrote: >>>>=20 >>>> Other than -j1 style builds (or equivalent), one pretty much >>>> always needs to go looking around for a non-panic failure. It >>>> is uncommon for all the material to be together in the build >>>> log in such contexts. >>>=20 >>> Running make cleandir twice and restarting -j4 buildworld brought >>> the process full circle: A silent hang, no debugger response, no >>> console warnings. That's what sent me down the rabbit hole of make >>> without clean, which worked at least once... >>=20 >> Unfortunately, such a hang tends to mean that log files and such >> were not completely written out to media. We do not get to see >> evidence of the actual failure time frame, just somewhat before. >> (compiler/linker output and such can have the same issues of >> ending up with incomplete updates.) >>=20 >> So, pretty much my notes are unlikely to be strongly tied to >> any solid evidence: more like alternatives to possibly explore >> that could be far off the mark. >>=20 >> It is not clear if you were using: >>=20 >> LDFLAGS.lld+=3D -Wl,--threads=3D1 >>=20 >> or some such to limit the multi-thread linking and its memory. > No, I wasn't trying to limit ld.lld thread number. You might want to try a significant change in the memory use just to see if it makes a difference or not --despite the extra time. This could included limiting lld thread usage and limiting to a smaller -jN . >> I'll note that if -j4 gets 4 links running in parallel it used >> to be each could have something like 5 threads active on a 4 >> core machine, so 20 or so threads. (I've not checked llvm11's >> lld behavior. It might avoid such for defaults.) >>=20 >> You have not reported any testing of -j2 or -j3 so far, just >> -j4 . (Another way of limiting memory use, power use, temperature, >> etc. .) >>=20 > Not recently, simply because it's so slow to build. On my "production" > armv7 machines running stable/12 I do use -j2. But, they get updated > only a couple times per year, when there's a security issue.=20 See the earlier note about possibly deliberate test of using less memory space. (And more later, below.) >> You have not reported if your boot complained about the swap >> space size or if you have adjusted related settings to make >> non-default tradeoffs for swap amanagment for these specific >> tests. I recommend not tailoring and using a swap size total >> that is somewhat under what starts to complain when there is >> no tailoring. >>=20 > Both Pi2 and Pi3 have been complaining about too much swap > since I first got them. Near as can be told it's never been > a demonstrated problem, thus far. Now, as things like LLVM > get bigger and bigger, it seems possible excess swap might > cause, or obscure, other problems. For the Pi2 I picked 2 > GB from the old "2x physical RAM" rule.=20 I'd take those warnings as FreeSD reporting that the system is expected to be in a mistuned configuration by "normal" criteria. Doing the tuning to allow more swap has the documented tradeoffs: kern.maxswzone . . . Note that swap metadata can be fragmented, which means = that the system can run out of space before it reaches the theoretical limit. Therefore, care should be taken to = not configure more swap than approximately half of the theoretical maximum. (The above is what the warning is about. But then there is . . .) Running out of space for swap metadata can leave the = system in an unrecoverable state. Therefore, you should only change this parameter if you need to greatly extend the = KVM reservation for other resources such as the buffer cache = or kern.ipc.nmbclusters. Modifies kernel option VM_SWZONE_SIZE_MAX. (NOTE: That last paragraph is talking about *decreasing* kern.maxswzone to get more room for non-swap-managment things in KVM. [Too bad the wording says "change".] But increasing kern.maxswzone to allow more swap leaves less space for the buffer cache or like, making for tradeoffs being involved.) >>> The residue of the top screen shows >>>=20 >>> last pid: 63377; load averages: 4.29, 4.18, 4.15 = up 1+07:11:07 04:46:46 >>> 60 processes: 5 running, 55 sleeping >>> CPU: 70.7% user, 0.0% nice, 26.5% system, 2.8% interrupt, 0.0% = idle >>> Mem: 631M Active, 4932K Inact, 92M Laundry, 166M Wired, 98M Buf, 18M = Free >>> Swap: 2048M Total, 119M Used, 1928M Free, 5% Inuse, 16K In, 3180K = Out >>> packet_write_wait: Connection to 50.1.20.26 port 22: Broken pipe >>> bob@raspberrypi:~ $ ssh www.zefox.com RES STATE C TIME = WCPU COMMAND >>> ssh: connect to host www.zefox.com port 22: Connection timed = out86.17% c++ >>> bob@raspberrypi:~ $ 1 99 0 277M 231M RUN 0 3:26 = 75.00% c++ >>> 63245 bob 1 99 0 219M 173M CPU0 0 2:10 = 73.12% c++ >>> 62690 bob 1 98 0 354M 234M RUN 3 9:42 = 47.06% c++ >>> 63377 bob 1 30 0 5856K 2808K nanslp 0 0:00 = 3.13% gstat >>> 38283 bob 1 24 0 5208K 608K wait 2 2:00 = 0.61% sh >>> 995 bob 1 20 0 6668K 1184K CPU3 3 8:46 0.47% = top >>> 990 bob 1 20 0 12M 1060K select 2 0:48 0.05% = sshd >>> .... >>=20 >> This does not look like ld was in use as of the last top >> display update's content. But the time between reasonable >> display updates is fairly long relative to CPU activity >> so it is only suggestive. >>=20 >>> [apologies for typing over the remnants] >>>=20 >>> I've put copies of the build and swap logs at >>>=20 >>> http://www.zefox.net/~fbsd/rpi2/buildworld/ >>>=20 >>> The last vmstat entry (10 second repeat time) reports: >>> procs memory page disks faults = cpu >>> r b w avm fre flt re pi po fr sr da0 sd0 in sy = cs us sy id >>> 4 0 14 969160 91960 685 2 2 1 707 304 0 0 11418 = 692 1273 45 5 50 >>>=20 >>> Does that point to the memory exhaustion suggested earlier in the = thread? >>> At this point /boot/loader.conf contains = vm.pfault_oom_attempts=3D"-1", but=20 >>> that's a relic of long-ago attempts to use USB flash for root and = swap. >>> Might removing it stimulate more warning messages? >>>=20 >>=20 >> vm.pfault_oom_attempts=3D"-1" should only be used in contexts where >> running out of swap will not happen. Otherwise a deadlocked system >> can result if it does run out of swap. (Run-out has more senses the >> just the swap partition being fully used: other internal resources >> for keeping track of the swap can run into its limits.) I've no >> evidence that the -1 was actually a problem. >>=20 >> I do not find any 1000+ ms/w or ms/r figures in swapscript.log . >> I found 3 examples of a little under 405 (on sdda0*), 3 between >> 340 and 345 (da0*), 4 in the 200s (da0*), under 60 in the 100s >> (da0*). It does not look to me like the recorded part had problems >> with the long latencies that you used to have happen. >>=20 >> So I've not found any specific evidence about what led to the >> hangup. So my earlier questions/suggestions are basically >> arbitrary and I would not know what to do with any answers >> to the questions. >>=20 >> The only notes that are fairly solid are about the hangup leading >> to there being some files that were likely incompletely updated >> (logs, compiler output files, etc.). >>=20 >=20 > The notion that log files might be truncated didn't didn't register=20 > until you brought it up.=20 >=20 > The obvious things to try seem to be: > Disable vm.pfault_oom_attempts=3D"-1" > Decrease swap partition size > Try using WITH_META_MODE I'll note that you may well have already spent more time not getting complete builds than doing one "use less memory" test would have taken (linker thread count limits, smaller -jN). To find fairly optimal settings for building on the RPi2 V1.1, you may have to explore on both sides of the optimal settings range, just to identify the optimal range for the settings as being between known bounds. Of course, for all I know, the "use less memory" tests might also fail. If that happened, it would tend to stop spending time testing configurations even less likely to finish building. > WITH_META_MODE seems relatively easy; just add -DWITH_META_MODE to = the make > command line for buildworld and add filemon_load=3D"YES" to = boot/loader.conf, > but I wonder about the burdens it'll impose on CPU and disk = space/throughput.=20 > There's nothing to lose at this stage, but I'm all ears if there's a = better > approach. You could trade off some I/O by not generating swapscript.log since it seems to not be of much help for the current issue. I'll also note that META_MODE by default does not generate as much stdout text but stores more across the various .meta files. So, by default, buildworld.log (by itself) would be much smaller. (So disabling recording of buildworld.log would not make much of a difference.) I'll remind of the things that are environment-only variables, including WITH_META_MODE : QUOTE The environment of make(1) for the build can be controlled via the SRC_ENV_CONF variable, which defaults to /etc/src-env.conf. Some examples that may only be set in this file are WITH_DIRDEPS_BUILD, = and WITH_META_MODE, and MAKEOBJDIRPREFIX as they are environment-only variables. END QUOTE "may only be set in this file" is false relative to setting the enviromnent on the command line but true compared to the likes of putting the text in /etc/src.conf or the like. My script for building for armv7 in a armv7 context looks something like: script = ~/sys_typescripts/typescript_make_armv7_nodebug_clang_bootstrap-armv7-host= -$(date +%Y-%m-%d:%H:%M:%S) \ env __MAKE_CONF=3D"/root/src.configs/make.conf" SRCCONF=3D"/dev/null" = SRC_ENV_CONF=3D"/root/src.configs/src.conf.armv7-clang-bootstrap.armv7-hos= t" \ WITH_META_MODE=3Dyes \ MAKEOBJDIRPREFIX=3D"/usr/obj/armv7_clang/arm.armv7" \ make $* (So I happened to not use -DWITH_META_MODE . The context is /bin/sh based, by the way.) There are contexts for which I control UBLR_LOADADDR via an additional line in the above, such as: WORLD_FLAGS=3D"${WORLD_FLAGS} UBLDR_LOADADDR=3D0x42000000" \ (But such is old material that I've not validated as being needed in even remotely modern times.) =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)