From owner-freebsd-database Mon Mar 30 02:38:29 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id CAA11046 for freebsd-database-outgoing; Mon, 30 Mar 1998 02:38:29 -0800 (PST) (envelope-from owner-freebsd-database@FreeBSD.ORG) Received: from mail.cs.tu-berlin.de (root@mail.cs.tu-berlin.de [130.149.17.13]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id CAA10997; Mon, 30 Mar 1998 02:38:00 -0800 (PST) (envelope-from wosch@cs.tu-berlin.de) Received: from caramba.cs.tu-berlin.de (wosch@caramba.cs.tu-berlin.de [130.149.17.12]) by mail.cs.tu-berlin.de (8.8.8/8.8.8) with ESMTP id MAA04474; Mon, 30 Mar 1998 12:31:38 +0200 (MET DST) Received: (from wosch@localhost) by caramba.cs.tu-berlin.de (8.8.8/8.8.8) id MAA03346; Mon, 30 Mar 1998 12:31:31 +0200 (MET DST) Message-ID: <19980330123130.39177@caramba.cs.tu-berlin.de> Date: Mon, 30 Mar 1998 12:31:30 +0200 From: Wolfram Schneider To: shimon@simon-shapiro.org, Wolfram Schneider Cc: freebsd-database@FreeBSD.ORG, andreas@klemm.gtn.com, scrappy@hub.org, Satoshi Asami , Amancio Hasty Subject: Re: [PORTS] Pgaccess doesn't run on -current anymore, Update References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: ; from Simon Shapiro on Sun, Mar 29, 1998 at 01:57:30PM -0800 Sender: owner-freebsd-database@FreeBSD.ORG Precedence: bulk On 1998-03-29 13:57:30 -0800, Simon Shapiro wrote: > We have been playing with the idea of normalizing the archive into an > RDBMS. Some of the benefits are: > > * no need to update the threads database. It will always be updated. > * Users can create, easily, their own thread logic with no impact on > system performance. > * Searching on normalized fields are many times faster, and much less > costly in system resources. Some figures ... The FreeBSD mailing list archive is 620MB large. There are currently 270,000 messages. The archive grow with 100,000 messages/year. If you plan to use a real SQL database, you should consider at least 500,000 data sets, better 1 million. You need 2GB for the raw E-Mails and 2-4GB for the index. I don't know if there are free available databases which can handle this large data. That was the hardware part. You must hire a database expert, a Web designer and a cgi script programmer. All people should be willing to work for at least 2-3 years on this project. This is not an easy task. A full update of the thread database took 6 min on hub (Pentium Pro), thats 100MB/min ;-) An update for the last week took 3-6 seconds. -- Wolfram Schneider http://www.freebsd.org/~wosch/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-database" in the body of the message