From owner-freebsd-database Mon Mar 30 13:03:52 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id NAA21921 for freebsd-database-outgoing; Mon, 30 Mar 1998 13:03:52 -0800 (PST) (envelope-from owner-freebsd-database@FreeBSD.ORG) Received: from fallout.campusview.indiana.edu (fallout.campusview.indiana.edu [149.159.1.1]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id NAA21906 for ; Mon, 30 Mar 1998 13:03:41 -0800 (PST) (envelope-from jfieber@indiana.edu) Received: from localhost (jfieber@localhost) by fallout.campusview.indiana.edu (8.8.8/8.8.7) with SMTP id QAA08249; Mon, 30 Mar 1998 16:03:30 -0500 (EST) Date: Mon, 30 Mar 1998 16:03:30 -0500 (EST) From: John Fieber To: Simon Shapiro cc: freebsd-database@FreeBSD.ORG Subject: Mail indexing infrastructure In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-database@FreeBSD.ORG Precedence: bulk On Mon, 30 Mar 1998, Simon Shapiro wrote: > > The FreeBSD mailing list archive is 620MB large. There are currently > > 270,000 messages. The archive grow with 100,000 messages/year. > > Excellent. How many years back do we want to keep? The current indexed archive goes back to 1994. > Also, if the current engine is so great, how come all these people are > excited about replacing it? Thread retrieval and date scoping. However, most proposed solutions involve a wholesale replacement rather than augumenting what we have, which works pretty well, all told. Basically, the vector-space ranked retrieval we already have, possibly scoped by date, is the best way to start a search, followed by thread retrieval once a promising message has been found. Wolfram's home-brew solution for threads is more along the lines of what we need. I have working date scoping in prototype, but there are performance problems--freeWAIS really doesn't handle that sort of thing very well and I'm a bit concerned about killing www.freebsd.org with it because I know it will be a popular feature. I also have half a mind to provide relevance feedback (a "find more like this..." link) but my free time is much smaller than the things I have to fill it with. :( -john To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-database" in the body of the message