Date: Mon, 11 Dec 2000 11:19:16 -0800 (PST) From: "Jason C. Wells" <jcwells@nwlink.com> To: freebsd-doc@freebsd.org Subject: Search Engine / Indexing Mail Archive Message-ID: <Pine.SOL.3.96.1001211105843.2738B-100000@utah>
next in thread | raw e-mail | index | archive | help
I just got my DSL making this possible. :) Is there any reason why I should not run an index against the mailing list archive for the purpose of developing a new search engine? I would only index on the portions of the archive found under the URL http://docs.freebsd.org/mail/archive/2000/. I have only that much disk space. After I run this index and determine if my setup is worth a hoot, I intend make it live on my connection for a few of you to examine. Once this thing becomes more material, I foresee discussing its implementation further. And for those interested in numbers: 125 MB of local docs were indexed in 2.0 hours over an unloaded 10 mbps LAN. Those 125 MB of docs resulted in a database that consumed 89.4 MB. The database will consume 71% of the space consumed by the docs that are searched. This can be reduced if the document body is not indexed. (but who wants to do that?) Thank you, Jason C. Wells To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SOL.3.96.1001211105843.2738B-100000>