Date: Thu, 19 Oct 2006 11:24:34 +0200 From: Benjamin Lutz <mail@maxlor.com> To: freebsd-ports@freebsd.org Subject: Parallel Builds Message-ID: <200610191124.39379.mail@maxlor.com>
next in thread | raw e-mail | index | archive | help
--nextPart2056558.eNS7bUEN2q Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hello, Since Multi-core processors are becoming popular (or, more=20 egocentrically, since I've acquired one), I've become interested in=20 parallel compilation. Unfortunately, it seems that parallel builds of=20 any kind are completely unsupported by the ports framework at the=20 moment. My experimentation with parallel builds has lead to a lot of=20 build failures: Many ports fail when compiled with gmake -j2 instead of=20 gmake, and running, say, two portupgrade instances in parallel has them=20 step on each others often. As using n processor cores instead of just one gives pretty much an=20 n-fold speed increase, particularly when compiling C++ code, I'm=20 interested in investigating what it would take to add some degree of=20 parallelism to the ports. I'm sure I'm not the only person that has thought about this. Maybe=20 there already is an effort to allow for parallelism in port builds. I=20 therefore would like for people working on this to speak up, or more=20 generally, to start a discussion on how this could be implemented. I'll=20 start with giving my own thoughts. I currently see three ways to add=20 parallism to the ports: o Mark the ports that allow parallel building by adding a new flag that can be used in ports makefiles, eg. PARALLEL_BUILDING=3Dyes. With such a port, the build target would call, say "gmake -j${PARALLEL_NUM}" instead of just "gmake". PARALLEL_NUM would be set in /etc/make.conf. If it is undefined, the build target would fall back to the old behavior. I'll call this "micro-parallelism" for now. Advantages:=20 + The modification to the ports framework would be relatively small, since the build tool (make/gmake/whatever) takes care of the difficult bits like locking. + Real build speed increase, particularly with large ports where it matters most (these usually have non-linear compilation dependencies, so they can be parallelized). Disadvantages: - Each port would have to be marked with PARALLEL_BUILDING=3Dyes individually. This means more work for the maintainers, and will mean that introduction of this feature will take time. (On the other hand, there are only a few ports that are both large and popular, eg., KDE, adding this feature just for those would already be a big win.) For a lot of software it is not obvious whether it supports parallel building, and it may have a low (but non-zero) probability for compilation failure with parallel building, leading to ports being marked with PARALLEL_BUILDING=3Dyes in error, which will lead to users encountering build failures. (Or maybe that's not that much of a problem - after reports of build failures come in, the PARALLEL_BUILDING=3Dyes flag could be removed again, and users that depend on the build always succeeding could simply not uses parallel builds. Another idea would be to use PARALLEL_BUILDING=3Dmaybe if the port maintainer is unsure, which will allow conservative users to use parallel building only for ports that are guaranteed to compile in parallel.) - The build speed advantage for ports whose built can't be=20 parallelized well is small (I believe that stage 1 of the gcc build would be an example for this). Also, small ports, which spend a lot of their time (proportionally) in the configure script would not see much of a speed-up. o Have the ports framework support building of several ports in parallel. This could mean that either "make -j2 install" works in a port directory (so the build of a port's dependencies would happen in parallel), or that it's possible to run more than one port build at one time. As above, the amount of parallelism would be configurable with a variable in /etc/make.conf, and there'd be a fallback to the old behavior. I'll call this "macro-parallelism". Advantages: + No change needed to the individual ports (probably). + Assuming a correct implementation, no increased probability for build failures. + Build speed-up for software consisting of several packages, eg. KDE, or when installing a new system. Disadvantages: - Probably difficult to implement. Locking, build failures and interruptions would have to be taken care of. Maybe it's not actually possible to do this with our make(1) (I haven't properly investigated this yet). - No speed gain when updating single large ports, eg. gcc. (To be fair, it must be said that some of the large ports, eg. OpenOffice.org, don't support micro-parallelism either. Macro- parallelism would at least allow the otherwise unused CPUs to do something sometimes.) o Leave the ports framework as it is, and implement support for parallel building in add-on tool, eg., portupgrade. The tool would support automatic parallelism ("portupgrade -a" would automatically build ports in parallel where possible), or having several user-created instances running at the same time. I'll call this "tool-based macro-parallelism". Advantages: + No change needed to the ports at all (at least theoretically, in practice minor changes might make the development of the build tool much easier). + Assuming a correct implementation, no increased probability for build failures. + Build speed-up for software consisting of several packages, eg. KDE, or when installing a new system. Disadvantages: - Moderately difficult to implement. Locking, build failures and interruptions would have to be taken care of. I don't see problems that can't be solved though. - No speed gain when updating single large ports, eg. gcc. (To be fair, it must be said that some of the large ports, eg. OpenOffice.org, don't support micro-parallelism either. Macro- parallelism would at least allow the otherwise unused CPUs to do something sometimes.) A combination of micro- and macro-parallelism seems attractive, since=20 there are situations where only one of these is supported, but I don't=20 see how it could work properly (barring a naive approach where you end=20 up running n^2 processes), since it would require cooperation between=20 make(1) or the add-on tool and the build tool used by the individual=20 port and the latter is more or less an unknown. Phew. That turned into a long email. If you're still reading, thanks! Cheers Benjamin --nextPart2056558.eNS7bUEN2q Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQBFN0RXzZEjpyKHuQwRAsbQAJ0dgz7JfbgCWjgEVMxZP4bSFZ67ZACfYpEN 4ERUQIRfXjJFIDGLY/P4xVY= =kI1h -----END PGP SIGNATURE----- --nextPart2056558.eNS7bUEN2q--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200610191124.39379.mail>