Skip site navigation (1)Skip section navigation (2)
Date:      02 Jan 1997 16:21:38 +0000
From:      Paul Richards <p.richards@elsevier.co.uk>
To:        John Fieber <jfieber@indiana.edu>
Cc:        "Jordan K. Hubbard" <jkh@time.cdrom.com>, Francisco Reyes <francisco@natserv.com>, FreeBSd Chat list <chat@freebsd.org>
Subject:   Re: mailing list archives
Message-ID:  <57g20k120d.fsf@tees.elsevier.co.uk>
In-Reply-To: John Fieber's message of Fri, 27 Dec 1996 21:06:34 -0500 (EST)
References:  <Pine.BSI.3.95.961227204329.12503M-100000@fallout.campusview.indiana.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
John Fieber <jfieber@indiana.edu> writes:

> It isn't so much weight as being inappropriate for the task. 
> Using postgres would be like entering long haul tractor trailer
> rig in the Indianapolis 500.  Using msql would be like using a
> GMC Suburban in the same context. Relational databases in general
> are not well suited for tasks where record and field structure
> has a lot of variability, eg mail. 

Postgres isn't that big but I agree with the following.

> The tools of choice for text will be based on either inverted
> indexes or vector representations.  The better of these will
> offer stemming, synonym matching and soundex matching.

Databases aren't actually that good for general string searches.

> To get the answers, we need thread retrieval.  For this, I don't
> think we need new indexing software, we just need to figure out
> how to take an existing message and *automatically* formulate
> appropriate queries to build the thread from it. 

Hypermail could be used for a big chunk of this if we accept that each
message will be in a separate file. The problem then becomes rather
trivial. 

I don't think I have time to implement this but if someone has time
then the following is the basic principle of what would be required.
The basic idea is that freewais returns a headline that is the path of
the archived message. I use a scheme like this in work where basically
freewais returns me a key rather than the content.

At build time.

1) Put all our mail through hypermail.
2) Configure FreeWAIS to return a headline that is the hypermail key
   for the mail messages that match.

At access time.

1) The search script calls waisq with the query string
2) waisq will return all the hypermail keys for messages that match in
   the headline which can be extracted using a simple perl script.
3) These keys are converted into a page of links that the user can
   jump into.
4) Once the initial link is followed hypermail will handle all the
   issues regarding following threads etc.

-- 
  Paul Richards. Originative Solutions Ltd.  (Netcraft Ltd. contractor)
  Elsevier Science TIS online journal project.
  Email: p.richards@elsevier.co.uk
  Phone: 0370 462071 (Mobile), +44 (0)1865 843155



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?57g20k120d.fsf>