FreeBSD Mail Archives

Date:      Thu, 19 Oct 2006 11:24:34 +0200
From:      Benjamin Lutz <mail@maxlor.com>
To:        freebsd-ports@freebsd.org
Subject:   Parallel Builds
Message-ID:  <200610191124.39379.mail@maxlor.com>

next in thread | raw e-mail | index | archive | help

--nextPart2056558.eNS7bUEN2q
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hello,

Since Multi-core processors are becoming popular (or, more=20
egocentrically, since I've acquired one), I've become interested in=20
parallel compilation. Unfortunately, it seems that parallel builds of=20
any kind are completely unsupported by the ports framework at the=20
moment. My experimentation with parallel builds has lead to a lot of=20
build failures: Many ports fail when compiled with gmake -j2 instead of=20
gmake, and running, say, two portupgrade instances in parallel has them=20
step on each others often.

As using n processor cores instead of just one gives pretty much an=20
n-fold speed increase, particularly when compiling C++ code, I'm=20
interested in investigating what it would take to add some degree of=20
parallelism to the ports.

I'm sure I'm not the only person that has thought about this. Maybe=20
there already is an effort to allow for parallelism in port builds. I=20
therefore would like for people working on this to speak up, or more=20
generally, to start a discussion on how this could be implemented. I'll=20
start with giving my own thoughts. I currently see three ways to add=20
parallism to the ports:

 o Mark the ports that allow parallel building by adding a new flag
   that can be used in ports makefiles, eg. PARALLEL_BUILDING=3Dyes.
   With such a port, the build target would call, say
   "gmake -j${PARALLEL_NUM}" instead of just "gmake". PARALLEL_NUM
   would be set in /etc/make.conf. If it is undefined, the build
   target would fall back to the old behavior. I'll call this
   "micro-parallelism" for now.

   Advantages:=20
   + The modification to the ports framework would be relatively small,
     since the build tool (make/gmake/whatever) takes care of the
     difficult bits like locking.
   + Real build speed increase, particularly with large ports where it
     matters most (these usually have non-linear compilation
     dependencies, so they can be parallelized).

   Disadvantages:
   - Each port would have to be marked with PARALLEL_BUILDING=3Dyes
     individually. This means more work for the maintainers, and will
     mean that introduction of this feature will take time. (On the
     other hand, there are only a few ports that are both large and
     popular, eg., KDE, adding this feature just for those would already
     be a big win.)
     For a lot of software it is not obvious whether it supports
     parallel building, and it may have a low (but non-zero) probability
     for compilation failure with parallel building, leading to ports
     being marked with PARALLEL_BUILDING=3Dyes in error, which will lead
     to users encountering build failures. (Or maybe that's not that
     much of a problem - after reports of build failures come in, the
     PARALLEL_BUILDING=3Dyes flag could be removed again, and users that
     depend on the build always succeeding could simply not uses
     parallel builds. Another idea would be to use
     PARALLEL_BUILDING=3Dmaybe if the port maintainer is unsure, which
     will allow conservative users to use parallel building only for
     ports that are guaranteed to compile in parallel.)
   - The build speed advantage for ports whose built can't be=20
     parallelized well is small (I believe that stage 1 of the gcc build
     would be an example for this). Also, small ports, which spend a
     lot of their time (proportionally) in the configure script would
     not see much of a speed-up.

 o Have the ports framework support building of several ports in
   parallel. This could mean that either "make -j2 install" works in
   a port directory (so the build of a port's dependencies would happen
   in parallel), or that it's possible to run more than one port build
   at one time. As above, the amount of parallelism would be
   configurable with a variable in /etc/make.conf, and there'd be a
   fallback to the old behavior. I'll call this "macro-parallelism".

   Advantages:
   + No change needed to the individual ports (probably).
   + Assuming a correct implementation, no increased probability for
     build failures.
   + Build speed-up for software consisting of several packages, eg.
     KDE, or when installing a new system.

   Disadvantages:
   - Probably difficult to implement. Locking, build failures and
     interruptions would have to be taken care of. Maybe it's not
     actually possible to do this with our make(1) (I haven't
     properly investigated this yet).
   - No speed gain when updating single large ports, eg. gcc. (To be
     fair, it must be said that some of the large ports, eg.
     OpenOffice.org, don't support micro-parallelism either. Macro-
     parallelism would at least allow the otherwise unused CPUs to
     do something sometimes.)

 o Leave the ports framework as it is, and implement support for
   parallel building in add-on tool, eg., portupgrade. The tool would
   support automatic parallelism ("portupgrade -a" would automatically
   build ports in parallel where possible), or having several
   user-created instances running at the same time. I'll call this
   "tool-based macro-parallelism".

   Advantages:
   + No change needed to the ports at all (at least theoretically, in
     practice minor changes might make the development of the build
     tool much easier).
   + Assuming a correct implementation, no increased probability for
     build failures.
   + Build speed-up for software consisting of several packages, eg.
     KDE, or when installing a new system.

   Disadvantages:
   - Moderately difficult to implement. Locking, build failures and
     interruptions would have to be taken care of. I don't see problems
     that can't be solved though.
   - No speed gain when updating single large ports, eg. gcc. (To be
     fair, it must be said that some of the large ports, eg.
     OpenOffice.org, don't support micro-parallelism either. Macro-
     parallelism would at least allow the otherwise unused CPUs to
     do something sometimes.)

A combination of micro- and macro-parallelism seems attractive, since=20
there are situations where only one of these is supported, but I don't=20
see how it could work properly (barring a naive approach where you end=20
up running n^2 processes), since it would require cooperation between=20
make(1) or the add-on tool and the build tool used by the individual=20
port and the latter is more or less an unknown.

Phew. That turned into a long email. If you're still reading, thanks!

Cheers
Benjamin

--nextPart2056558.eNS7bUEN2q
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQBFN0RXzZEjpyKHuQwRAsbQAJ0dgz7JfbgCWjgEVMxZP4bSFZ67ZACfYpEN
4ERUQIRfXjJFIDGLY/P4xVY=
=kI1h
-----END PGP SIGNATURE-----

--nextPart2056558.eNS7bUEN2q--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200610191124.39379.mail>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation