From owner-freebsd-hackers@FreeBSD.ORG Sat May 12 22:00:33 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A6F9E16A404 for ; Sat, 12 May 2007 22:00:33 +0000 (UTC) (envelope-from talon@lpthe.jussieu.fr) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by mx1.freebsd.org (Postfix) with ESMTP id 305ED13C458 for ; Sat, 12 May 2007 22:00:32 +0000 (UTC) (envelope-from talon@lpthe.jussieu.fr) Received: from parthe.lpthe.jussieu.fr (parthe.lpthe.jussieu.fr [134.157.10.1]) by shiva.jussieu.fr (8.13.8/jtpda-5.4) with ESMTP id l4CLiN4d030768 ; Sat, 12 May 2007 23:44:23 +0200 (CEST) X-Ids: 164 Received: by parthe.lpthe.jussieu.fr (Postfix, from userid 10096) id 4501DBF6A0; Sat, 12 May 2007 23:44:22 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on parthe.lpthe.jussieu.fr X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.8 Received: from asmodee.lpthe.jussieu.fr (asmodee.lpthe.jussieu.fr [134.157.10.34]) by parthe.lpthe.jussieu.fr (Postfix) with ESMTP id 0B49EBE87B; Sat, 12 May 2007 23:44:21 +0200 (CEST) Received: by asmodee.lpthe.jussieu.fr (Postfix, from userid 2005) id C1F2942C5; Sat, 12 May 2007 23:44:22 +0200 (CEST) Date: Sat, 12 May 2007 23:44:22 +0200 From: Michel Talon To: Kris Kennaway Message-ID: <20070512214422.GA88480@lpthe.jussieu.fr> References: <20070512004209.GA12218@lpthe.jussieu.fr> <17989.8202.624522.136573@bhuda.mired.org> <20070512090935.GA13929@lpthe.jussieu.fr> <20070512193302.GA24673@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070512193302.GA24673@xor.obsecurity.org> User-Agent: Mutt/1.4.2.2i X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (shiva.jussieu.fr [134.157.0.164]); Sat, 12 May 2007 23:44:23 +0200 (CEST) X-Virus-Scanned: ClamAV 0.88.7/3231/Sat May 12 17:57:12 2007 on shiva.jussieu.fr X-Virus-Status: Clean X-j-chkmail-Score: MSGID : 46463537.000 on shiva.jussieu.fr : j-chkmail score : X : 0/50 0 0.539 -> 1 X-Miltered: at shiva.jussieu.fr with ID 46463537.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Mailman-Approved-At: Sat, 12 May 2007 22:20:06 +0000 Cc: freebsd-hackers@freebsd.org, Mike Meyer Subject: Re: DPS Initial Ideas X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 May 2007 22:00:33 -0000 On Sat, May 12, 2007 at 03:33:02PM -0400, Kris Kennaway wrote: > On Sat, May 12, 2007 at 11:09:35AM +0200, Michel Talon wrote: > > > Seriously, the FreeBSD package > > system is in great need of a profound overhaul, pretending it works well > > is complete denial of reality. I hope that young people working on > > summer code projects will infuse *new* ideas, and not spend their > > vacations polishing inadequate tools. > > I know that this is your belief, but please try to avoid grasping at > straws: there are elements in your argument that are along the lines > of "The FreeBSD package system is broken and needs to be fundamentally > changed. Rewriting it to use SQLite is a fundamental change. > Therefore rewriting it to use SQLite will fix the problems." > Really i don't think at all this way. I think that *perhaps* SQLite may marginally better than a Berkeley database for solving part of the problem, not much more. What i reacted to, was the conservatism which pervades the community as soon as someone emits the idea of using a new tool. > First figure out what specific problems need to be solved, then figure > out how to solve them, not the other way around. So far I have seen > little discussion of how SQLite is necessary and sufficient for fixing > fundamental issues. The argument in favour of SQL seems to boil down > to "It's SQL! You can do more complex queries...if you wanted to". No, for me the main argument is that SQL is more familiar for many people than running a perl script to connect to a Berkeley database. I have also heard that SQLite is more performant, but i would have to see it to beleive it. > > Without a clear demonstration of how this would solve a problem > associated with package management, it is not very compelling and > basically reduces to change for the sake of change. I think that a lot of changes are necessary, and it seems they will happen. So *perhaps* it may be beneficial in this sea of changes to consider a minor change, moving from a more traditional Berkeley database to SQLite. > > As I discussed in my email yesterday, there are serious issues to be > solved. I think some of the issues have nothing to do with the database question. Some of the issues are entirely trivial to solve. One of the worst offenders for misbehaviour of the package system is the constant changes in the port origins and the poor standardisation of the package names. When it will be clear that these name changes bring nothing to the table but introduce a lot of confusion both for end users and automated programs, things will be easier. It may be that borrowing from Debian the idea of "abstract" dependencies which can be fulfilled by several concrete packages may also simplify the dependency problem. For example tomcat may depend on "java" and java my be fulfilled either by diablo-jdk15 or jdk15. This way when you change from diablo-jdk15 to jdk15 you don't need to change anything to tomcat. Another feature that Debian has, and which may happily complete the previous one, is the specification of necessary dependencies with a version number in a certain range (this obviously requires a reasonable standardisation of version numbers, so that comparison of -0.99 to -1.0-rc doesn't depend on arcane rules). This way you don't need to change dependencies which are in the correct range, even if a more recent version exists. This mechanism has been imported in NetBSD pkgsrc. And a problem which has proven useful in Debian is keeping track of the packages which have been required by the end user and those which have been installed as dependencies. This is the difference between apt-get and aptitude. Apparently people are very happy to be able to remove not only a package they have required, but also all its dependencies (which are not required by another program) at one stroke. This also helps in case some big package requires dependency A, but after upgrade, they have changed their mind and require alternative dependency B. With this mechanism, after upgrade A disappears, while without it you will have both an upgraded version of A and B. I have observed on my machine this is an important cause of time monotonic bloat of the package tree. To answer the slowness problem in registering installed packages, one may think about making use of the INDEX file. In fact all the information that is necessary to fill the dependency entries is contained in INDEX, and accessible here in milliseconds with any tool such as awk. It so happens that the ports system doesn't make any use of the INDEX file and systematically recomputes the dependencies through recursive make invocations which are very time consuming. Of course this requires up to date INDEX, or a mechanism to keep INDEX continually up to date. Part of the registration is also filling the +REQUIRED_BY files of the dependencies of a package when one installs a package. If this package has a lot of dependencies this means opening, editing and closing a large number of files. This is expensive. One may imagine using a database containing the global dependency information, then +REQUIRED_BY files are no more necessary, since the information can be recomputed in very little time. In my little python experiments, recomputing the complete set of +REQUIRED_BY files for around 700 ports takes around one second. By the way, topological sorting the DAG of the whole port tree (> 15 000 ports) takes of the order of 2 seconds, so it is clear that if major performance problems occur, they cannot be ascribed to such DAG sorting. > Some of them can be solved by improving the storage backend > of the package database to use a database; but this is in progress > using existing tools. Yes, and i don't buy the idea that using *existing* tools is better than using the best tool for the job (assuming one can prove what is the best tool, considering power, familiarity, etc.). > > Given that this work is happening (or at least will be happening, I am > not sure when the SoC officially starts), the best thing is for > interested people to work with Garrett to help him achieve the goals > of his project. Sure. I am convinced this is the reason why several people, including myself present some ideas in the mailing list now, before Garrett begins working on his project. Of course after that, he will be in charge, with his mentor, and i hope they will do something wonderful. As you are well aware, designing a very good ports system is particularly difficult, unfortunately, particularly in the FreeBSD context where building from source is considered fashionable, which makes designing an efficient upgrade system almost impossible. > > Kris -- Michel TALON