Date: Sat, 28 Dec 96 17:45:07 -0400 From: "Francisco Reyes" <francisco@natserv.com> To: "FreeBSD doc Mailing list" <doc@freebsd.org>, "John Fieber" <jfieber@indiana.edu> Subject: Re: mailing list archives Message-ID: <199612282245.RAA13615@revelstone.jvm.com>
next in thread | raw e-mail | index | archive | help
On Fri, 27 Dec 1996 21:06:34 -0500 (EST), John Fieber wrote: >To get the answers, we need thread retrieval. For this, I don't >think we need new indexing software, we just need to figure out >how to take an existing message and *automatically* formulate >appropriate queries to build the thread from it. John, Do you think any of the existing tools can do it? I am about to start working on this project by doing a home-grown system. Should I proceed? Given my free time it will be a while before I have the system (anywhere from 1 to 3 months), but I already started thinking about the basics design. The features I was thinking to have are: -Index any word. -Logical operators in searches: "and", "or", "not" . Later on "near" and lexical searches for selected words (doing the lexical matching by means of a table). -Capable of storing an expiration date for articles and re-use their allocated space after they have been expired. -Give answers in a threaded form. In the initial phase I have considered indexing the exiting files and later on develop the system to it handles the storage of the messages. Managing the messages would allow for compression and file expiration. My initial considerations are: -- Use a Red-Black-Tree to index all words. For each word in the tree have a linked list. Basically use the tree to search for the start of a linked list for each word; This will save space since I won't store the word key/value for all the elements in the linked list.. -- Keep a file with the last physical location of each message file processed. When the program is run it will only index what has been added to each of the message files.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199612282245.RAA13615>