Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Oct 2023 08:31:49 +0200
From:      =?UTF-8?Q?Fernando_Apestegu=C3=ADa?= <fernando.apesteguia@gmail.com>
To:        Robert Clausecker <fuz@freebsd.org>
Cc:        ports@freebsd.org
Subject:   Re: We need to do something about build times
Message-ID:  <CAGwOe2YrScoZQPAbHHRK%2BpHH4au_LNpXbc=%2Bc0ALLKRTdMrHEA@mail.gmail.com>
In-Reply-To: <ZTgXDSmpAq6lpT3f@fuz.su>
References:  <ZTgXDSmpAq6lpT3f@fuz.su>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000def8540608849c2b
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 24, 2023 at 9:12=E2=80=AFPM Robert Clausecker <fuz@freebsd.org>=
 wrote:

> The build times have gone up to the point where they are unsustainable.
> Frequent updates to key ports (like llvm*, rust, gcc*) make it so that
> basically every time I prepare a new batch of commits, I have to rebuild
> a variety of toolchain ports across 8 jails (amd64/i386/arm64/armv7 each
> for FreeBSD 12.4 and 13.2).  This takes multiple days.  And I'm working
> with hardware that's quite recent (for x86, an 8 thread Skylake box, for
> arm, an 8 thread Windows 2023 dev kit).
>
> By the time the builds are done, some random update has usually caused
> the ports to be out of date again, so if I were to rebase, I would have
> to do all of this again.  And again.  And again.
>
> Particularly bad offenders are gcc and rust.  Ccache is ineffective for
> these as gcc has LTO turned on, which seems to more than triple the
> regular build time to more than 24 hours even on a fast Skylake box.
> This is single threaded as I build multiple ports at once; if I were to
> build multi-threaded, the same amount of total CPU hours would have been
> spent, so that would not fix my problem.  Ccache is also ineffective for
> rust of course.
>

There are two LTOs that can be implemented: thin and fat.
ThinLTO provides almost the same benefit as FatLTO at a much smaller cost.
Some software provides the two options, some others not.
The people at LLVM were working on this years ago.

See http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html

If a port offers the two options, I think it would be a good solution to
have ThinLTO by default, but think twice for FatLTO.


> There's another issue in that ccache doesn't scale to large cache sizes
> (my experiments show that anything larger than 20 GB seems to cause
> problems as ccache repeatedly tries to scan the whole thing for evictions=
),
> and the sizes that work are just not enough to be effective.  What would
> help is being able to have one cache for each combination of ports tree
> and jail, but Poudriere has no support for that.
>
> Another bad offender is texlive.  For some reason, texlive-texmf needs to
> be rebuilt frequently, despite mostly comprising data that is just
> unpacked and repacked.  This takes forever and pegs the disk at 100% for
> more than an hour as the texlive source tarball is repeatedly extracted
> and then compressed into packages.  I don't get why the texlive stuff is
> not split in such a way that the stuff that is just repacked lives in its
> own port with no dependencies so it only needs to be rebuilt on rare
> texlive updates.
>
> And it seems I'm slowly killing my build SSD like that.  After just about
> 9 months, it is already at 100 TB of writes just from port builds.
> Building with workdirs in memory is no longer an option as that frequentl=
y
> kills my build server by filling all its RAM with build files until no
> processes can be started anymore.  Poudriere does not have an effective
> mechanism to prevent this (tmpfs limits don't work as the ports in
> question require very large workdirs, tend to take very long to build and
> tend to be built all at the same time for multiple jails).
>

For this, while not ideal, you can choose to build the dependencies alone
in poudriere with one job.
So, you build rust, gcc and llvm so they don't build at the same time.
In my machine with 32 Gb RAM *without LTO* I can build in memory one port
at a time.
(This wasn't true some time ago with mongodb unfortunately).


>
> Using prebuilt packages is not an option as they lag behind by several
> days/weeks and lead to an inconsistent testing environment.  It is also
> not a good solution to chose non-default build options for these ports
> as it is not clear if that would affect the validity of the testbuilds.
>

I don't understand this point.
If you work in batches, chances are that you will benefit from package
seeding in poudriere.
There might be some delay, but in general, I don't think packages lag for
weeks.
Have you used PACKAGE_FETCH_* in poudriere? It works pretty decently for me=
.



> How can we fix this problem and make ports development sustainable again?
>
> Some ideas:
>
>  - disable LTO and other options by default that increase build times by
>    such a ridiculous degree.  This would really make a huge impact with
>    very little work.  I don't think LTO on toolchain ports improves build
>    times enough in comparison to the extra time it takes to build these.
>
>  - for gcc, switch to single or no bootstrap by default.  We have known
>    good toolchains we use to build gcc.  There's really no reason to
>    build it multiple times just out of paranoia.  The maintainer is
>    supposed to check that gcc is built correctly without bootstrapping
>    so consumers don't need to build it multiple times.
>
>  - untangle some of the dependencies so that less ports may trigger
>    rebuilds of critical ports.  For example, llvm docs could be moved to
>    separate ports so that updates in the documentation toolchain do not
>    trigger an LLVM rebuild.
>
>  - reduce USES to chose lighter dependencies by default.  E.g. USES=3Dllv=
m
>    could depend on the light flavour by default.  I'm sure only very few
>    ports need all of LLVM and the light flavour is faster to build.
>
>  - rework Poudriere's rebuild detection to not rebuild every port for
>    every random bullshit thing.  For example, I don't see why ports need
>    to be rebuilt for transitive changes in build dependencies.  E.g. if
>    port A has build depends on port B which build depends on port C, and
>    C is updated, then A has to be rebuilt despite its direct dependencies
>    being unchanged.  This does not appear to be reasonable.
>
>  - unbundle libraries more thoroughly.  We currently have dozens of
>    copies of LLVM, skia, webkit, and others in tree as ports just bundle
>    them instead of even making an attempt at unbundling.  This means that
>    every time they need to be patched, it's a whackamole at finding all
>    copies.  Plus build times suffer a lot.  I know it's hard, but perhaps
>    something can be done.  For example, I have given up on trying to make
>    electron work on armv7 as with every major version update, my patches
>    are randomly being dropped and I have to do it all again.  Like all
>    chromium ports, electron takes over two days to build on my arm box
>    and my time is insufficient for that.
>
>  - stop bulk bumping RUN_DEPENDS consumers when dependencies are updated,
>    or at least think carefully before doing so.  RUN_DEPENDS are only
>    installed after the build and should not affect the build.  For
>    example, sysutils/cdrtools uses the command line opus encoder and thus
>    depends on audio/opus.  There is absolutely no reason to bump it when
>    audio/opus is updated.  It just causes everybody to needlessly rebuild
>    and reinstall ports.  Sure there's the odd case where that needs to be
>    done, but it seems like some maintainers just always do that, even
>    when it's not needed.
>
>  - maybe add a system where ports can declare the oldest version of
>    themselves they are compatible to, in the sense that consumers only
>    need to be rebuilt if they were built against a version older than
>    that.  For example, if a shared library is updated with a bug fix
>    that does not change the ABI, there's no need to rebuild all consumers=
.
>
> With great frustration,
> Robert Clausecker
>
> --
> ()  ascii ribbon campaign - for an 8-bit clean world
> /\  - against html email  - against proprietary attachments
>

--000000000000def8540608849c2b
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Tue, Oct 24, 2023 at 9:12=E2=80=AF=
PM Robert Clausecker &lt;<a href=3D"mailto:fuz@freebsd.org">fuz@freebsd.org=
</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:=
0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">=
The build times have gone up to the point where they are unsustainable.<br>
Frequent updates to key ports (like llvm*, rust, gcc*) make it so that<br>
basically every time I prepare a new batch of commits, I have to rebuild<br=
>
a variety of toolchain ports across 8 jails (amd64/i386/arm64/armv7 each<br=
>
for FreeBSD 12.4 and 13.2).=C2=A0 This takes multiple days.=C2=A0 And I&#39=
;m working<br>
with hardware that&#39;s quite recent (for x86, an 8 thread Skylake box, fo=
r<br>
arm, an 8 thread Windows 2023 dev kit).<br>
<br>
By the time the builds are done, some random update has usually caused<br>
the ports to be out of date again, so if I were to rebase, I would have<br>
to do all of this again.=C2=A0 And again.=C2=A0 And again.<br>
<br>
Particularly bad offenders are gcc and rust.=C2=A0 Ccache is ineffective fo=
r<br>
these as gcc has LTO turned on, which seems to more than triple the<br>
regular build time to more than 24 hours even on a fast Skylake box.<br>
This is single threaded as I build multiple ports at once; if I were to<br>
build multi-threaded, the same amount of total CPU hours would have been<br=
>
spent, so that would not fix my problem.=C2=A0 Ccache is also ineffective f=
or<br>
rust of course.<br></blockquote><div><br></div><div>There are two LTOs that=
 can be implemented: thin and fat.</div><div>ThinLTO provides almost the sa=
me benefit as FatLTO at a much smaller cost.</div><div>Some software provid=
es the two options, some others not.</div><div>The people at LLVM were work=
ing on this years ago.<br></div><div><br></div><div>See <a href=3D"http://b=
log.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html">http://blog=
.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html</a></div><div>=
=C2=A0</div><div>If a port offers the two options, I think it would be a go=
od solution to have ThinLTO by default, but think twice for FatLTO.</div><d=
iv><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px =
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
There&#39;s another issue in that ccache doesn&#39;t scale to large cache s=
izes<br>
(my experiments show that anything larger than 20 GB seems to cause<br>
problems as ccache repeatedly tries to scan the whole thing for evictions),=
<br>
and the sizes that work are just not enough to be effective.=C2=A0 What wou=
ld<br>
help is being able to have one cache for each combination of ports tree<br>
and jail, but Poudriere has no support for that.<br>
<br>
Another bad offender is texlive.=C2=A0 For some reason, texlive-texmf needs=
 to<br>
be rebuilt frequently, despite mostly comprising data that is just<br>
unpacked and repacked.=C2=A0 This takes forever and pegs the disk at 100% f=
or<br>
more than an hour as the texlive source tarball is repeatedly extracted<br>
and then compressed into packages.=C2=A0 I don&#39;t get why the texlive st=
uff is<br>
not split in such a way that the stuff that is just repacked lives in its<b=
r>
own port with no dependencies so it only needs to be rebuilt on rare<br>
texlive updates.<br>
<br>
And it seems I&#39;m slowly killing my build SSD like that.=C2=A0 After jus=
t about<br>
9 months, it is already at 100 TB of writes just from port builds.<br>
Building with workdirs in memory is no longer an option as that frequently<=
br>
kills my build server by filling all its RAM with build files until no<br>
processes can be started anymore.=C2=A0 Poudriere does not have an effectiv=
e<br>
mechanism to prevent this (tmpfs limits don&#39;t work as the ports in<br>
question require very large workdirs, tend to take very long to build and<b=
r>
tend to be built all at the same time for multiple jails).<br></blockquote>=
<div><br></div><div>For this, while not ideal, you can choose to build the =
dependencies alone in poudriere with one job.</div><div>So, you build rust,=
 gcc and llvm so they don&#39;t build at the same time.</div><div>In my mac=
hine with 32 Gb RAM *without LTO* I can build in memory one port at a time.=
</div><div>(This wasn&#39;t true some time ago with mongodb unfortunately).=
<br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
">
<br>
Using prebuilt packages is not an option as they lag behind by several<br>
days/weeks and lead to an inconsistent testing environment.=C2=A0 It is als=
o<br>
not a good solution to chose non-default build options for these ports<br>
as it is not clear if that would affect the validity of the testbuilds.<br>=
</blockquote><div><br></div><div>I don&#39;t understand this point.</div><d=
iv>If you work in batches, chances are that you will benefit from package s=
eeding in poudriere.</div><div>There might be some delay, but in general, I=
 don&#39;t think packages lag for weeks.<br></div><div>Have you used PACKAG=
E_FETCH_* in poudriere? It works pretty decently for me.</div><div><br></di=
v><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
How can we fix this problem and make ports development sustainable again?<b=
r>
<br>
Some ideas:<br>
<br>
=C2=A0- disable LTO and other options by default that increase build times =
by<br>
=C2=A0 =C2=A0such a ridiculous degree.=C2=A0 This would really make a huge =
impact with<br>
=C2=A0 =C2=A0very little work.=C2=A0 I don&#39;t think LTO on toolchain por=
ts improves build<br>
=C2=A0 =C2=A0times enough in comparison to the extra time it takes to build=
 these.<br>
<br>
=C2=A0- for gcc, switch to single or no bootstrap by default.=C2=A0 We have=
 known<br>
=C2=A0 =C2=A0good toolchains we use to build gcc.=C2=A0 There&#39;s really =
no reason to<br>
=C2=A0 =C2=A0build it multiple times just out of paranoia.=C2=A0 The mainta=
iner is<br>
=C2=A0 =C2=A0supposed to check that gcc is built correctly without bootstra=
pping<br>
=C2=A0 =C2=A0so consumers don&#39;t need to build it multiple times.<br>
<br>
=C2=A0- untangle some of the dependencies so that less ports may trigger<br=
>
=C2=A0 =C2=A0rebuilds of critical ports.=C2=A0 For example, llvm docs could=
 be moved to<br>
=C2=A0 =C2=A0separate ports so that updates in the documentation toolchain =
do not<br>
=C2=A0 =C2=A0trigger an LLVM rebuild.<br>
<br>
=C2=A0- reduce USES to chose lighter dependencies by default.=C2=A0 E.g. US=
ES=3Dllvm<br>
=C2=A0 =C2=A0could depend on the light flavour by default.=C2=A0 I&#39;m su=
re only very few<br>
=C2=A0 =C2=A0ports need all of LLVM and the light flavour is faster to buil=
d.<br>
<br>
=C2=A0- rework Poudriere&#39;s rebuild detection to not rebuild every port =
for<br>
=C2=A0 =C2=A0every random bullshit thing.=C2=A0 For example, I don&#39;t se=
e why ports need<br>
=C2=A0 =C2=A0to be rebuilt for transitive changes in build dependencies.=C2=
=A0 E.g. if<br>
=C2=A0 =C2=A0port A has build depends on port B which build depends on port=
 C, and<br>
=C2=A0 =C2=A0C is updated, then A has to be rebuilt despite its direct depe=
ndencies<br>
=C2=A0 =C2=A0being unchanged.=C2=A0 This does not appear to be reasonable.<=
br>
<br>
=C2=A0- unbundle libraries more thoroughly.=C2=A0 We currently have dozens =
of<br>
=C2=A0 =C2=A0copies of LLVM, skia, webkit, and others in tree as ports just=
 bundle<br>
=C2=A0 =C2=A0them instead of even making an attempt at unbundling.=C2=A0 Th=
is means that<br>
=C2=A0 =C2=A0every time they need to be patched, it&#39;s a whackamole at f=
inding all<br>
=C2=A0 =C2=A0copies.=C2=A0 Plus build times suffer a lot.=C2=A0 I know it&#=
39;s hard, but perhaps<br>
=C2=A0 =C2=A0something can be done.=C2=A0 For example, I have given up on t=
rying to make<br>
=C2=A0 =C2=A0electron work on armv7 as with every major version update, my =
patches<br>
=C2=A0 =C2=A0are randomly being dropped and I have to do it all again.=C2=
=A0 Like all<br>
=C2=A0 =C2=A0chromium ports, electron takes over two days to build on my ar=
m box<br>
=C2=A0 =C2=A0and my time is insufficient for that.<br>
<br>
=C2=A0- stop bulk bumping RUN_DEPENDS consumers when dependencies are updat=
ed,<br>
=C2=A0 =C2=A0or at least think carefully before doing so.=C2=A0 RUN_DEPENDS=
 are only<br>
=C2=A0 =C2=A0installed after the build and should not affect the build.=C2=
=A0 For<br>
=C2=A0 =C2=A0example, sysutils/cdrtools uses the command line opus encoder =
and thus<br>
=C2=A0 =C2=A0depends on audio/opus.=C2=A0 There is absolutely no reason to =
bump it when<br>
=C2=A0 =C2=A0audio/opus is updated.=C2=A0 It just causes everybody to needl=
essly rebuild<br>
=C2=A0 =C2=A0and reinstall ports.=C2=A0 Sure there&#39;s the odd case where=
 that needs to be<br>
=C2=A0 =C2=A0done, but it seems like some maintainers just always do that, =
even<br>
=C2=A0 =C2=A0when it&#39;s not needed.<br>
<br>
=C2=A0- maybe add a system where ports can declare the oldest version of<br=
>
=C2=A0 =C2=A0themselves they are compatible to, in the sense that consumers=
 only<br>
=C2=A0 =C2=A0need to be rebuilt if they were built against a version older =
than<br>
=C2=A0 =C2=A0that.=C2=A0 For example, if a shared library is updated with a=
 bug fix<br>
=C2=A0 =C2=A0that does not change the ABI, there&#39;s no need to rebuild a=
ll consumers.<br>
<br>
With great frustration,<br>
Robert Clausecker<br>
<br>
-- <br>
()=C2=A0 ascii ribbon campaign - for an 8-bit clean world <br>
/\=C2=A0 - against html email=C2=A0 - against proprietary attachments<br>
</blockquote></div></div>

--000000000000def8540608849c2b--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGwOe2YrScoZQPAbHHRK%2BpHH4au_LNpXbc=%2Bc0ALLKRTdMrHEA>