Date: Fri, 19 Jan 2001 23:38:13 -0700 From: Wes Peters <wes@softweyr.com> To: "Russell L. Carter" <rcarter@pinyon.org> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Clustering FreeBSD Message-ID: <3A693255.56934523@softweyr.com> References: <20010120054211.1EBBE70@pinyon.org>
next in thread | previous in thread | raw e-mail | index | archive | help
"Russell L. Carter" wrote: > > %> No it would not! Back in '94 I ported dmake to FreeBSD > %> and built just about every numerics package out there > %> on a 4 CPU cluster. Worked fine, but not much in overall > %> speedup, because... tadum! Where do you get the source > %> files, and how do you get the objs back :-) Not low > %> latency, eh? F-Enet then, G-Enet now :) > % > %You need a better file server. My previous employer, where the software > %staff recompiles 3 million lines of code 20 or 30 times a day, employs > %pmake and a farm of Sun Ultra-5 workstations to parallelize their makes. > %It allows them to complete a build in an hour that would take a single > %Ultra-5 almost 20 hours to complete, even with 3 or 4 builds running in > %parallel. The network is 100BaseTX to the workstations and 1000BaseSX > %to the (NFS) fileserver. > > Cool! I'd like to learn more. > > Then... can you elaborate on the build structure a bit? Is it > a single large dir (surely not), or how do the dependencies work? No, there were nearly a hundred directories scattered all over the place. It was actually quite a mess. There were also a couple of hand-enforced relationships that were quite messy. The entire mass was big enough that parallelizing was hugely beneficial even with the ugly mess the build system was. > For instance, with ACE/TAO (many hours to build when including > orbsvcs) there's only a few large directories that can > be parallelized over say 10 cpus by gmake, at least. These are the types of directories that can benefit easily. Ideally, with no overhead for job starting, you would be able to use n processors to compile n files all at the same time. Realistically you're quite limited by the network bandwidth and the speed of the file server, but since compiling is not a completely I/O bound process, you can do perhaps some- what better than just an obvious bandwidth multipler. For instance, if you have 100BaseTX on the build machines and 1000Base?? on the file server, you make actually be able to utilize 12 or 14 or maybe even 20 build machines before saturating the fileserver. > The rest have > ten files or less where each file takes maybe 45s to compile on a > 1GHz processor. There are quite a few of these. > And directories are compiled sequentially. If you replace your recursive Makefiles with a single dependency tree, it doesn't matter how many files are in a directory. You can launch enough compiles to complete the directory, building the executable or library or whatever is made there, because you can be sure that all if it's dependencies have already been built, and that nothing that depends on it will get touched until it has completed. There is a good discussion of this on the Perforce web pages, in their discussion of Jam/MR, a somewhat newer tool similar to Make. Jamfiles are never recursive; tools are provided for building Jamfiles that describe the entire project so the dependency tree is completely expressed. > %> Nowadays, you'd want to "globus ify" things, rather than > %> use use PVM. > %> > %> But critically, speedup would only happen if jobs were > %> allocated at a higher level than they are now. > %> > %> Now for building something like a full version of TAO, > %> why that might work. But even then, a factor of 2x is > %> unlikely until the dependencies are factored out at > %> the directory level. > % > %See the paper "Recursive Make Considered Harmful." Make is an amazing > %tool when used correctly. > > That's not the problem, unfortunately. I've never had a problem > rebuilding dependencies unnecessarily, or any of those > other problems described. Well precompiled headers would be > really really cool. The problem, again, is that parallelism > is limited by the directory structure, and the directory structure > is entirely rational. The directory structure has nothing to do with the Makefiles. To obtain the goal the paper suggests, you replace the recursive Makefiles with a single top-level Makefile that describes ALL of the targets and ALL of the dependencies. Note that this does not require a single mono- lithic Makefile; the top level Makefile can be a shell that includes per-directory Makefiles. The important part is to get a single dependency tree with no cycles in the graph. -- "Where am I, and what am I doing in this handbasket?" Wes Peters Softweyr LLC wes@softweyr.com http://softweyr.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3A693255.56934523>