Date: 02 Jan 1997 16:21:38 +0000 From: Paul Richards <p.richards@elsevier.co.uk> To: John Fieber <jfieber@indiana.edu> Cc: "Jordan K. Hubbard" <jkh@time.cdrom.com>, Francisco Reyes <francisco@natserv.com>, FreeBSd Chat list <chat@freebsd.org> Subject: Re: mailing list archives Message-ID: <57g20k120d.fsf@tees.elsevier.co.uk> In-Reply-To: John Fieber's message of Fri, 27 Dec 1996 21:06:34 -0500 (EST) References: <Pine.BSI.3.95.961227204329.12503M-100000@fallout.campusview.indiana.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
John Fieber <jfieber@indiana.edu> writes: > It isn't so much weight as being inappropriate for the task. > Using postgres would be like entering long haul tractor trailer > rig in the Indianapolis 500. Using msql would be like using a > GMC Suburban in the same context. Relational databases in general > are not well suited for tasks where record and field structure > has a lot of variability, eg mail. Postgres isn't that big but I agree with the following. > The tools of choice for text will be based on either inverted > indexes or vector representations. The better of these will > offer stemming, synonym matching and soundex matching. Databases aren't actually that good for general string searches. > To get the answers, we need thread retrieval. For this, I don't > think we need new indexing software, we just need to figure out > how to take an existing message and *automatically* formulate > appropriate queries to build the thread from it. Hypermail could be used for a big chunk of this if we accept that each message will be in a separate file. The problem then becomes rather trivial. I don't think I have time to implement this but if someone has time then the following is the basic principle of what would be required. The basic idea is that freewais returns a headline that is the path of the archived message. I use a scheme like this in work where basically freewais returns me a key rather than the content. At build time. 1) Put all our mail through hypermail. 2) Configure FreeWAIS to return a headline that is the hypermail key for the mail messages that match. At access time. 1) The search script calls waisq with the query string 2) waisq will return all the hypermail keys for messages that match in the headline which can be extracted using a simple perl script. 3) These keys are converted into a page of links that the user can jump into. 4) Once the initial link is followed hypermail will handle all the issues regarding following threads etc. -- Paul Richards. Originative Solutions Ltd. (Netcraft Ltd. contractor) Elsevier Science TIS online journal project. Email: p.richards@elsevier.co.uk Phone: 0370 462071 (Mobile), +44 (0)1865 843155
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?57g20k120d.fsf>