From owner-freebsd-database Mon Mar 30 06:07:05 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id GAA07039 for freebsd-database-outgoing; Mon, 30 Mar 1998 06:07:05 -0800 (PST) (envelope-from owner-freebsd-database@FreeBSD.ORG) Received: from fallout.campusview.indiana.edu (fallout.campusview.indiana.edu [149.159.1.1]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id GAA07033; Mon, 30 Mar 1998 06:07:04 -0800 (PST) (envelope-from jfieber@indiana.edu) Received: from localhost (jfieber@localhost) by fallout.campusview.indiana.edu (8.8.8/8.8.7) with SMTP id JAA06907; Mon, 30 Mar 1998 09:06:45 -0500 (EST) Date: Mon, 30 Mar 1998 09:06:45 -0500 (EST) From: John Fieber To: Wolfram Schneider cc: shimon@simon-shapiro.org, freebsd-database@FreeBSD.ORG, andreas@klemm.gtn.com, scrappy@hub.org, Satoshi Asami , Amancio Hasty Subject: Re: [PORTS] Pgaccess doesn't run on -current anymore, Update In-Reply-To: <19980330123130.39177@caramba.cs.tu-berlin.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-database@FreeBSD.ORG Precedence: bulk On Mon, 30 Mar 1998, Wolfram Schneider wrote: > On 1998-03-29 13:57:30 -0800, Simon Shapiro wrote: > > We have been playing with the idea of normalizing the archive into an > > RDBMS. Some of the benefits are: > > > > * no need to update the threads database. It will always be updated. > > * Users can create, easily, their own thread logic with no impact on > > system performance. > > * Searching on normalized fields are many times faster, and much less > > costly in system resources. [snip] > If you plan to use a real SQL database, you should consider at least > 500,000 data sets, better 1 million. You need 2GB for the raw E-Mails > and 2-4GB for the index. I don't know if there are free available > databases which can handle this large data. It has been well established for many years by professionals in database R&D that traditional a RDBMS are utterly and completely the wrong tool for free text searching. This turns out to be true even for some relatively structured data types like bibliographic records. There *are* some tasks in a real-world applications that are RDBMS type things--a message-id based thread index is simple to implement for instance--so I'm all for hybrid systems. The big RDBMS vendors usually have some optional module optimized for free-text searching module and some SQL extensions to access it. I've pondered writing such a module for postgres, but don't really know enough about extending postgres to know how well it would work. -john To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-database" in the body of the message