From owner-freebsd-database Mon Mar 30 12:28:05 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA12784 for freebsd-database-outgoing; Mon, 30 Mar 1998 12:28:05 -0800 (PST) (envelope-from owner-freebsd-database@FreeBSD.ORG) Received: from sendero.simon-shapiro.org (sendero-fddi.Simon-Shapiro.ORG [206.190.148.2]) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id MAA12715 for ; Mon, 30 Mar 1998 12:27:58 -0800 (PST) (envelope-from shimon@simon-shapiro.org) Received: (qmail 3648 invoked from network); 30 Mar 1998 20:37:11 -0000 Received: from localhost.simon-shapiro.org (HELO sendero-fxp0.simon-shapiro.org) (@127.0.0.1) by localhost.simon-shapiro.org with SMTP; 30 Mar 1998 20:37:11 -0000 Message-ID: X-Mailer: XFMail 1.3-alpha-032398 [p0] on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Mon, 30 Mar 1998 12:37:11 -0800 (PST) Reply-To: shimon@simon-shapiro.org Organization: The Simon Shapiro Foundation From: Simon Shapiro To: John Fieber Subject: RE: Mailing list search interface Cc: freebsd-database@FreeBSD.ORG Sender: owner-freebsd-database@FreeBSD.ORG Precedence: bulk On 30-Mar-98 John Fieber wrote: > On Mon, 30 Mar 1998, Simon Shapiro wrote: > >> Truth must be told, currently PostgreSQL uses Unix files to store its >> indices and tables, so performance is not all that it could be. I am > > A properly constructed index for a full text database (read: NOT > glimpse) requires very little disk i/o for most queries. Eg, > prefix trie hashing requires about two reads per search term in > the query. I just read a paper describing some optimtzaion that > reduces that to one read about 50% of the time. A picture starts emerging here, folks. We normalize the normalizable and then build a datatype which knows to do dictionary based searches on the text. The excellent news here is that disk I/O per record can be reduced. This allows us to easily utilize more than one Unix instance/host per database. This gives us the memory and CPU bandwidth. This can turn really useful real fast. BTW, when considering text/scripts/database alternatives, think not only about generating the search indices, but query too. Descent RDBMS engines cache these things very well, in userspace. ---------- Sincerely Yours, Simon Shapiro Shimon@Simon-Shapiro.ORG Voice: 503.799.2313 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-database" in the body of the message