From owner-freebsd-ports@FreeBSD.ORG Thu Nov 2 13:16:07 2006 Return-Path: X-Original-To: freebsd-ports@freebsd.org Delivered-To: freebsd-ports@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3C42C16A416 for ; Thu, 2 Nov 2006 13:16:07 +0000 (UTC) (envelope-from michel@lpthe.jussieu.fr) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1AB3943D81 for ; Thu, 2 Nov 2006 13:15:59 +0000 (GMT) (envelope-from michel@lpthe.jussieu.fr) Received: from parthe.lpthe.jussieu.fr (parthe.lpthe.jussieu.fr [134.157.10.1]) by shiva.jussieu.fr (8.13.7/jtpda-5.4) with ESMTP id kA2DFoG9021818 ; Thu, 2 Nov 2006 14:15:50 +0100 (CET) X-Ids: 164 Received: from niobe.lpthe.jussieu.fr (niobe.lpthe.jussieu.fr [134.157.10.41]) by parthe.lpthe.jussieu.fr (Postfix) with ESMTP id 32098A00AB; Thu, 2 Nov 2006 14:15:50 +0100 (CET) Received: by niobe.lpthe.jussieu.fr (Postfix, from userid 2005) id A37121D; Thu, 2 Nov 2006 14:16:09 +0100 (CET) Date: Thu, 2 Nov 2006 14:16:09 +0100 From: Michel Talon To: Edwin Groothuis Message-ID: <20061102131609.GA63889@lpthe.jussieu.fr> References: <20061102105423.GA63547@lpthe.jussieu.fr> <20061102110849.GC90162@k7.mavetju> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20061102110849.GC90162@k7.mavetju> User-Agent: Mutt/1.4.2.1i X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (shiva.jussieu.fr [134.157.0.164]); Thu, 02 Nov 2006 14:15:57 +0100 (CET) X-Virus-Scanned: ClamAV 0.88.5/2146/Thu Nov 2 07:58:29 2006 on shiva.jussieu.fr X-Virus-Status: Clean X-Miltered: at shiva.jussieu.fr with ID 4549EF86.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! Cc: freebsd-ports@freebsd.org Subject: Re: Building the INDEX X-BeenThere: freebsd-ports@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting software to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Nov 2006 13:16:07 -0000 On Thu, Nov 02, 2006 at 10:08:49PM +1100, Edwin Groothuis wrote: > On Thu, Nov 02, 2006 at 11:54:23AM +0100, Michel Talon wrote: > > Today someone lent me a Core 2 Duo machine on which i have loaded FreeBSD-64. > > Nice result is that both my python program (that i have multithreaded - but > > there is room for improvement) and make index INDEX_JOBS=3 complete > > building the INDEX in 8 minutes. Of course this program requires python, but > > make index requires perl, which is not better. In view of this, one may > > Is that with the "make index" called from /usr/ports? The one which > crawls through about 16000 iterations of bsd.port.mk? And all you > needed to do was to replace the perl script with a python script? In fact perl is used in two places in make index. At global scope, there is one perl script, as you are saying, which collects the information from all ports and writes the INDEX file. But perl is also used 16000 times for each port, to run the commande "make describe" here. The real information is obtained by running make -V which are then processed via perl. I thought that replacing all these forking perls would provide big improvement, but in fact i discovered that even make -V forks tons of stuff. To quantify how much make describe is costing, i have modified my python script to emit make describe for each port. The result is that one loses 20% in total time. Nevertheless the command make index INDEX_JOBS=3 succeeds in being speedy because it has a lot of parallelism. It runs 3 jobs in parallel, all writing to separate files, so no time is lost waiting that an other make finishes, or in synchronisation. Using python, even when emitting threads, doesn't give parallelism because there is a "Giant lock" which ensures that only one thread can execute python code at any time. The parallelism can only been obtained when those threads launch external code, or code in modules which release the lock. People use this facility or asynchronous IO to speed up python programs, because you can intersperse execution of python code and the IO. In our problem, it happens that the timing is not IO bound at all, contrary to what i was thinking, it is essentially compute bound, the computation being here the execution of "make -V" which needs a lot of work, for example the parsing of the 6000 lines of bsd.port.mk! Hence you don't earn much using asynchronous IO techniques. To get an optimal time, you have to launch without interruption the optimal number of make commands and then parse the output. On a Core 2 Duo, it seems that the optimal number of threads is 4. The rest of the treatment, that is collecting the output of the make -V, and doing all the jobs leading to the INDEX is only the affair of some seconds for the python script, and i am sure that a perl script would be even faster, if not as readable. If being *much* faster was desirable, perhaps a way would be to concoct a pared down bsd.port.mk, containing only the variables relevant to the problem at hand. One may imagine that if make has to read some tenths or perhaps a hundred of lines it will be much faster than reading 6000 lines. Of course the challenge is to devise a way to obtain automatically the pared down version from the full version. Anyways, this little experiment shows that some problems can be solved by throwing hardware at them. A Core 2 Duo succeeds at running more than 3 times faster than a P4 3Ghz, so i suppose that with a quad core one will be able to build INDEX in 5 minutes. This is really impressive! -- Michel TALON