From owner-freebsd-database Mon Mar 30 14:21:00 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id OAA04859 for freebsd-database-outgoing; Mon, 30 Mar 1998 14:21:00 -0800 (PST) (envelope-from owner-freebsd-database@FreeBSD.ORG) Received: from fallout.campusview.indiana.edu (fallout.campusview.indiana.edu [149.159.1.1]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA04824 for ; Mon, 30 Mar 1998 14:20:53 -0800 (PST) (envelope-from jfieber@indiana.edu) Received: from localhost (jfieber@localhost) by fallout.campusview.indiana.edu (8.8.8/8.8.7) with SMTP id RAA08505; Mon, 30 Mar 1998 17:20:33 -0500 (EST) Date: Mon, 30 Mar 1998 17:20:33 -0500 (EST) From: John Fieber To: Simon Shapiro cc: freebsd-database@FreeBSD.ORG Subject: RE: Mail indexing infrastructure In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-database@FreeBSD.ORG Precedence: bulk On Mon, 30 Mar 1998, Simon Shapiro wrote: > > The current indexed archive goes back to 1994. > > This is not an answer to my question :-) Currently we are keeping 4 years. > Do we want to keep 40? 10? 5? Some (theoretical) limit has to be put. Oh, I would say indefinately until there is a compelling reason to dump some. The more we have, however, the more essential date scoping becomes. I think it is already becoming a bit of a problem. > If thread retrieval is based on Subject: line, an RDBMS is a trivially good > solution. One can even apply regex to the subject, limit dates, etc. Good thread indexing is based on subjects, message-ids, dates and content. Quick-and-dirty thread retrieval is an easy RDBMS problem, good thread retrieval is rather more complex. For a nice summary outline of threading methods and their performance, see: Lewis, David; Knowles, Kimberly (1997). Theading Electronic Mail: A Preliminary Study. Information Processing & Management, 33(2):209-217. > If the current system is good and should only be augmented, > rather than replaced, this is fine by me. Let me re-phrase: most proposals to date do replacement without preservation of what is good with the current system. A wholesale replacement WITH preservation of what is good would be most welcome. I'd be the first to jump up and down with glee to find a viable alternative to freeWAIS for doing full text searches with stemming, soundex matching, automatic term weighting etc... freeWAIS is is a festering heap of bugs, but it is the best the free software world has. Postgres with a module offering similar functionality would make me one happy camper. -john To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-database" in the body of the message