Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Dec 2000 11:19:16 -0800 (PST)
From:      "Jason C. Wells" <jcwells@nwlink.com>
To:        freebsd-doc@freebsd.org
Subject:   Search Engine / Indexing Mail Archive
Message-ID:  <Pine.SOL.3.96.1001211105843.2738B-100000@utah>

next in thread | raw e-mail | index | archive | help
I just got my DSL making this possible. :)

Is there any reason why I should not run an index against the mailing list
archive for the purpose of developing a new search engine?

I would only index on the portions of the archive found under the URL
http://docs.freebsd.org/mail/archive/2000/.  I have only that much disk
space. 

After I run this index and determine if my setup is worth a hoot, I intend
make it live on my connection for a few of you to examine.  Once this
thing becomes more material, I foresee discussing its implementation
further.

And for those interested in numbers:

125 MB of local docs were indexed in 2.0 hours over an unloaded 10 mbps LAN. 
Those 125 MB of docs resulted in a database that consumed 89.4 MB.
The database will consume 71% of the space consumed by the docs that are
searched.  This can be reduced if the document body is not indexed. (but
who wants to do that?)

Thank you,
Jason C. Wells



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SOL.3.96.1001211105843.2738B-100000>