From nobody Tue Oct 24 19:12:13 2023 X-Original-To: ports@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SFMBZ2SXWz4yP9w for ; Tue, 24 Oct 2023 19:12:22 +0000 (UTC) (envelope-from fuz@fuz.su) Received: from fuz.su (fuz.su [IPv6:2001:41d0:8:e508::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "fuz.su", Issuer "fuz.su" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SFMBY1C8Pz4LNY for ; Tue, 24 Oct 2023 19:12:20 +0000 (UTC) (envelope-from fuz@fuz.su) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of fuz@fuz.su designates 2001:41d0:8:e508::1 as permitted sender) smtp.mailfrom=fuz@fuz.su; dmarc=none Received: from fuz.su (localhost [127.0.0.1]) by fuz.su (8.16.1/8.16.1) with ESMTPS id 39OJCDQt068071 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Tue, 24 Oct 2023 21:12:13 +0200 (CEST) (envelope-from fuz@fuz.su) Received: (from fuz@localhost) by fuz.su (8.16.1/8.16.1/Submit) id 39OJCDvN068070 for ports@freebsd.org; Tue, 24 Oct 2023 21:12:13 +0200 (CEST) (envelope-from fuz) Date: Tue, 24 Oct 2023 21:12:13 +0200 From: Robert Clausecker To: ports@freebsd.org Subject: We need to do something about build times Message-ID: List-Id: Porting software to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-ports List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-ports@freebsd.org X-BeenThere: freebsd-ports@freebsd.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="SImME72nDArYNZp2" Content-Disposition: inline X-Spamd-Bar: ----- X-Spamd-Result: default: False [-5.06 / 15.00]; SIGNED_PGP(-2.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-0.98)[-0.985]; NEURAL_HAM_SHORT(-0.97)[-0.975]; FORGED_SENDER(0.30)[fuz@freebsd.org,fuz@fuz.su]; R_SPF_ALLOW(-0.20)[+a]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:16276, ipnet:2001:41d0::/32, country:FR]; R_DKIM_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; MLMMJ_DEST(0.00)[ports@freebsd.org]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; FREEFALL_USER(0.00)[fuz]; ARC_NA(0.00)[]; FROM_NEQ_ENVFROM(0.00)[fuz@freebsd.org,fuz@fuz.su]; FROM_HAS_DN(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[ports@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; TO_DOM_EQ_FROM_DOM(0.00)[] X-Rspamd-Queue-Id: 4SFMBY1C8Pz4LNY --SImME72nDArYNZp2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable The build times have gone up to the point where they are unsustainable. Frequent updates to key ports (like llvm*, rust, gcc*) make it so that basically every time I prepare a new batch of commits, I have to rebuild a variety of toolchain ports across 8 jails (amd64/i386/arm64/armv7 each for FreeBSD 12.4 and 13.2). This takes multiple days. And I'm working with hardware that's quite recent (for x86, an 8 thread Skylake box, for arm, an 8 thread Windows 2023 dev kit). By the time the builds are done, some random update has usually caused the ports to be out of date again, so if I were to rebase, I would have to do all of this again. And again. And again. Particularly bad offenders are gcc and rust. Ccache is ineffective for these as gcc has LTO turned on, which seems to more than triple the regular build time to more than 24 hours even on a fast Skylake box. This is single threaded as I build multiple ports at once; if I were to build multi-threaded, the same amount of total CPU hours would have been spent, so that would not fix my problem. Ccache is also ineffective for rust of course. There's another issue in that ccache doesn't scale to large cache sizes (my experiments show that anything larger than 20 GB seems to cause problems as ccache repeatedly tries to scan the whole thing for evictions), and the sizes that work are just not enough to be effective. What would help is being able to have one cache for each combination of ports tree and jail, but Poudriere has no support for that. Another bad offender is texlive. For some reason, texlive-texmf needs to be rebuilt frequently, despite mostly comprising data that is just unpacked and repacked. This takes forever and pegs the disk at 100% for more than an hour as the texlive source tarball is repeatedly extracted and then compressed into packages. I don't get why the texlive stuff is not split in such a way that the stuff that is just repacked lives in its own port with no dependencies so it only needs to be rebuilt on rare texlive updates. And it seems I'm slowly killing my build SSD like that. After just about 9 months, it is already at 100 TB of writes just from port builds. Building with workdirs in memory is no longer an option as that frequently kills my build server by filling all its RAM with build files until no processes can be started anymore. Poudriere does not have an effective mechanism to prevent this (tmpfs limits don't work as the ports in question require very large workdirs, tend to take very long to build and tend to be built all at the same time for multiple jails). Using prebuilt packages is not an option as they lag behind by several days/weeks and lead to an inconsistent testing environment. It is also not a good solution to chose non-default build options for these ports as it is not clear if that would affect the validity of the testbuilds. How can we fix this problem and make ports development sustainable again? Some ideas: - disable LTO and other options by default that increase build times by such a ridiculous degree. This would really make a huge impact with very little work. I don't think LTO on toolchain ports improves build times enough in comparison to the extra time it takes to build these. - for gcc, switch to single or no bootstrap by default. We have known good toolchains we use to build gcc. There's really no reason to build it multiple times just out of paranoia. The maintainer is supposed to check that gcc is built correctly without bootstrapping so consumers don't need to build it multiple times. - untangle some of the dependencies so that less ports may trigger rebuilds of critical ports. For example, llvm docs could be moved to separate ports so that updates in the documentation toolchain do not trigger an LLVM rebuild. - reduce USES to chose lighter dependencies by default. E.g. USES=3Dllvm could depend on the light flavour by default. I'm sure only very few ports need all of LLVM and the light flavour is faster to build. - rework Poudriere's rebuild detection to not rebuild every port for every random bullshit thing. For example, I don't see why ports need to be rebuilt for transitive changes in build dependencies. E.g. if port A has build depends on port B which build depends on port C, and C is updated, then A has to be rebuilt despite its direct dependencies being unchanged. This does not appear to be reasonable. - unbundle libraries more thoroughly. We currently have dozens of copies of LLVM, skia, webkit, and others in tree as ports just bundle them instead of even making an attempt at unbundling. This means that every time they need to be patched, it's a whackamole at finding all copies. Plus build times suffer a lot. I know it's hard, but perhaps something can be done. For example, I have given up on trying to make electron work on armv7 as with every major version update, my patches are randomly being dropped and I have to do it all again. Like all chromium ports, electron takes over two days to build on my arm box and my time is insufficient for that. - stop bulk bumping RUN_DEPENDS consumers when dependencies are updated, or at least think carefully before doing so. RUN_DEPENDS are only installed after the build and should not affect the build. For example, sysutils/cdrtools uses the command line opus encoder and thus depends on audio/opus. There is absolutely no reason to bump it when audio/opus is updated. It just causes everybody to needlessly rebuild and reinstall ports. Sure there's the odd case where that needs to be done, but it seems like some maintainers just always do that, even when it's not needed. - maybe add a system where ports can declare the oldest version of themselves they are compatible to, in the sense that consumers only need to be rebuilt if they were built against a version older than that. For example, if a shared library is updated with a bug fix that does not change the ABI, there's no need to rebuild all consumers. With great frustration, Robert Clausecker --=20 () ascii ribbon campaign - for an 8-bit clean world=20 /\ - against html email - against proprietary attachments --SImME72nDArYNZp2 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQKTBAABCgB9FiEExWcBrcoFY7LMaPxvWXxDScqS3gUFAmU4FwlfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEM1 NjcwMUFEQ0EwNTYzQjJDQzY4RkM2RjU5N0M0MzQ5Q0E5MkRFMDUACgkQWXxDScqS 3gVJVw/9HxCXRhdlSkOydtSFYgmSxdr4LGGMjf0y4APFSHjbA7oFOnIrBpZidM6I 5O9bhGNN2YCACFh/ae4jCUnM1YkKih51YmW4qHhNsCIS0Dhewhq1l3TUPZQv6XrU 6DpA56JIwhkM7FRVDwbaC2paf5t6CyXxtMaLbWeE7F3W3GoUqKkcVTqyxYHX2PPH Erfl/iuIcIuhFxk+USPKV0a4bKi/sQ5KgleCYMpNV7D3ck0vWbeiiqsJkFhWLFav LNaJ1788bNTgEU38/iAksLAypKwgvm/z8sTkztacJE10mYEsXbSCzYKek3Y5I97q R/9DqIkl7k5RCQeA60dw9AsyMkVmUAjGO3+CchV4Tx7lkeE2/tBOyXfwZFsiJpak +uFnplo6+i/21GgdelmjRvJoFgt/1aNbCP8/eysBqaljt7lmFWBMrWQsNJHiliW4 +i5itMQP3v2QsqzktT5Pv/8iQRcm9Wm9xgh8vaX9immdMegOhIq0b3xzflRsx+3m 4GaFPDzhx+2+rphlwYE/hNqe20uwdGmT4e6A3EQ7BR0dxCcpXxwbTADkyM0dYCjn POTc3uOub9ewQBT/jvJBoWn/A0ZM5Lz8xGmwSfu5XOoLfhI/DugJvuwoAL0OeUxn BPSZoY3Wddcx9aBO1duWQ8CSvSq9L23l/f+fofzpuKwRyy5AFp8= =ahrV -----END PGP SIGNATURE----- --SImME72nDArYNZp2--