From owner-freebsd-database  Mon Mar 30 12:28:05 1998
Return-Path: <owner-freebsd-database@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id MAA12784
          for freebsd-database-outgoing; Mon, 30 Mar 1998 12:28:05 -0800 (PST)
          (envelope-from owner-freebsd-database@FreeBSD.ORG)
Received: from sendero.simon-shapiro.org (sendero-fddi.Simon-Shapiro.ORG [206.190.148.2])
          by hub.freebsd.org (8.8.8/8.8.8) with SMTP id MAA12715
          for <freebsd-database@FreeBSD.ORG>; Mon, 30 Mar 1998 12:27:58 -0800 (PST)
          (envelope-from shimon@simon-shapiro.org)
Received: (qmail 3648 invoked from network); 30 Mar 1998 20:37:11 -0000
Received: from localhost.simon-shapiro.org (HELO sendero-fxp0.simon-shapiro.org) (@127.0.0.1)
  by localhost.simon-shapiro.org with SMTP; 30 Mar 1998 20:37:11 -0000
Message-ID: <XFMail.980330123711.shimon@simon-shapiro.org>
X-Mailer: XFMail 1.3-alpha-032398 [p0] on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <Pine.BSF.3.96.980330134953.7795A-100000@fallout.campusview.indiana.edu>
Date: Mon, 30 Mar 1998 12:37:11 -0800 (PST)
Reply-To: shimon@simon-shapiro.org
Organization: The Simon Shapiro Foundation
From: Simon Shapiro <shimon@simon-shapiro.org>
To: John Fieber <jfieber@indiana.edu>
Subject: RE: Mailing list search interface
Cc: freebsd-database@FreeBSD.ORG
Sender: owner-freebsd-database@FreeBSD.ORG
Precedence: bulk


On 30-Mar-98 John Fieber wrote:
> On Mon, 30 Mar 1998, Simon Shapiro wrote:
> 
>> Truth must be told, currently PostgreSQL uses Unix files to store its
>> indices and tables, so performance is not all that it could be.  I am
> 
> A properly constructed index for a full text database (read: NOT
> glimpse) requires very little disk i/o for most queries.  Eg,
> prefix trie hashing requires about two reads per search term in
> the query.  I just read a paper describing some optimtzaion that
> reduces that to one read about 50% of the time.

A picture starts emerging here, folks.  We normalize the normalizable and
then build a datatype which knows to do dictionary based searches on the
text.

The excellent news here is that disk I/O per record can be reduced.  This
allows us to easily utilize more than one Unix instance/host per database. 
This gives us the memory and CPU bandwidth.  This can turn really useful
real fast.

BTW, when considering text/scripts/database alternatives, think not only
about generating the search indices, but query too.  Descent RDBMS engines
cache these things very well, in userspace.


----------


Sincerely Yours, 

Simon Shapiro
Shimon@Simon-Shapiro.ORG                      Voice:   503.799.2313

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-database" in the body of the message