From owner-freebsd-doc Mon Dec 11 11: 5:14 2000 From owner-freebsd-doc@FreeBSD.ORG Mon Dec 11 11:05:12 2000 Return-Path: Delivered-To: freebsd-doc@freebsd.org Received: from smtp.nwlink.com (smtp.nwlink.com [209.20.130.57]) by hub.freebsd.org (Postfix) with ESMTP id B6C1437B402 for ; Mon, 11 Dec 2000 11:05:12 -0800 (PST) Received: from utah (jcwells@utah.nwlink.com [209.20.130.41]) by smtp.nwlink.com (8.9.3/8.9.1) with SMTP id LAA09630 for ; Mon, 11 Dec 2000 11:05:07 -0800 (PST) Date: Mon, 11 Dec 2000 11:19:16 -0800 (PST) From: "Jason C. Wells" X-Sender: jcwells@utah To: freebsd-doc@freebsd.org Subject: Search Engine / Indexing Mail Archive Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-doc@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I just got my DSL making this possible. :) Is there any reason why I should not run an index against the mailing list archive for the purpose of developing a new search engine? I would only index on the portions of the archive found under the URL http://docs.freebsd.org/mail/archive/2000/. I have only that much disk space. After I run this index and determine if my setup is worth a hoot, I intend make it live on my connection for a few of you to examine. Once this thing becomes more material, I foresee discussing its implementation further. And for those interested in numbers: 125 MB of local docs were indexed in 2.0 hours over an unloaded 10 mbps LAN. Those 125 MB of docs resulted in a database that consumed 89.4 MB. The database will consume 71% of the space consumed by the docs that are searched. This can be reduced if the document body is not indexed. (but who wants to do that?) Thank you, Jason C. Wells To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message