From owner-freebsd-database Mon Mar 30 11:43:10 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id LAA03738 for freebsd-database-outgoing; Mon, 30 Mar 1998 11:43:10 -0800 (PST) (envelope-from owner-freebsd-database@FreeBSD.ORG) Received: from sendero.simon-shapiro.org (sendero-fddi.Simon-Shapiro.ORG [206.190.148.2]) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id LAA03683 for ; Mon, 30 Mar 1998 11:43:03 -0800 (PST) (envelope-from shimon@simon-shapiro.org) Received: (qmail 2790 invoked from network); 30 Mar 1998 19:52:12 -0000 Received: from localhost.simon-shapiro.org (HELO sendero-fxp0.simon-shapiro.org) (@127.0.0.1) by localhost.simon-shapiro.org with SMTP; 30 Mar 1998 19:52:12 -0000 Message-ID: X-Mailer: XFMail 1.3-alpha-032398 [p0] on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <19980330123130.39177@caramba.cs.tu-berlin.de> Date: Mon, 30 Mar 1998 11:52:11 -0800 (PST) Reply-To: shimon@simon-shapiro.org Organization: The Simon Shapiro Foundation From: Simon Shapiro To: Wolfram Schneider Subject: Re: [PORTS] Pgaccess doesn't run on -current anymore, Update Cc: Amancio Hasty , Satoshi Asami , scrappy@hub.org, andreas@klemm.gtn.com, freebsd-database@FreeBSD.ORG Sender: owner-freebsd-database@FreeBSD.ORG Precedence: bulk On 30-Mar-98 Wolfram Schneider wrote: > On 1998-03-29 13:57:30 -0800, Simon Shapiro wrote: >> We have been playing with the idea of normalizing the archive into an >> RDBMS. Some of the benefits are: >> >> * no need to update the threads database. It will always be updated. >> * Users can create, easily, their own thread logic with no impact on >> system performance. >> * Searching on normalized fields are many times faster, and much less >> costly in system resources. > > Some figures ... > > The FreeBSD mailing list archive is 620MB large. There are currently > 270,000 messages. The archive grow with 100,000 messages/year. Excellent. How many years back do we want to keep? > If you plan to use a real SQL database, you should consider at least > 500,000 data sets, better 1 million. You need 2GB for the raw E-Mails > and 2-4GB for the index. I don't know if there are free available > databases which can handle this large data. Large? Assume 1 million messages in the ``current'' database. People can search the ``ancient'' database separately. Even if your dataset numbers are correct, this fits in 2 4GB partitions in a RAID array. For 4 million records, an indexed search in PostgreSQL 6.2.1 took about 1-2 seconds on a busy system (make buildworld in the background). > That was the hardware part. You must hire a database expert, a Web > designer and a cgi script programmer. All people should be willing to > work for at least 2-3 years on this project. This is not an easy task. Using your logic, we should close the FreeBSD project, as maintaining an Operating system like this takes 200-300 kernel experts. The database expert is available and willing to do it for free. If not, there are other database experts amoung FreeBSD users. A CGI interface already exists for the database interface. The HTML interface can be written by people like those who did the excellent job on the FreeBSD web pages. In other words, if the FreeBSD project cannot find the people to do this, then noone can. BTW, your time estimate is good ig you plan to e paid hourly for it. I nuilt much, much more complex RDBMS based information systems in fraction of that time. An email parser is no more than a week. The text search about the same. > A full update of the thread database took 6 min on hub (Pentium Pro), > thats 100MB/min ;-) An update for the last week took 3-6 seconds. Something is too good to be true here. How can you read Unix filesystems at 100 Megabytes per second? Also, if the current engine is so great, how come all these people are excited about replacing it? I have no opinion as my usage is too scarce and too superficial to vioce any opinion. My position is that IF there is a desire to build an RDBMS based engine, I will be happy to contribute my modest knowledge in the matters and some of my time. ---------- Sincerely Yours, Simon Shapiro Shimon@Simon-Shapiro.ORG Voice: 503.799.2313 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-database" in the body of the message