From owner-freebsd-questions@FreeBSD.ORG Wed Apr 6 16:40:27 2005 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6613516A4CE for ; Wed, 6 Apr 2005 16:40:27 +0000 (GMT) Received: from wolf.pjkh.com (wolf.pjkh.com [66.228.196.74]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2424143D39 for ; Wed, 6 Apr 2005 16:40:27 +0000 (GMT) (envelope-from freebsd@philip.pjkh.com) Received: from localhost (localhost [127.0.0.1]) by wolf.pjkh.com (Postfix) with ESMTP id 392D7553F for ; Wed, 6 Apr 2005 09:40:23 -0700 (PDT) Received: from wolf.pjkh.com ([127.0.0.1]) by localhost (wolf.pjkh.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 45137-04 for ; Wed, 6 Apr 2005 09:40:23 -0700 (PDT) Received: by wolf.pjkh.com (Postfix, from userid 1000) id 049965529; Wed, 6 Apr 2005 09:40:22 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by wolf.pjkh.com (Postfix) with ESMTP id F18BF54C9 for ; Wed, 6 Apr 2005 09:40:22 -0700 (PDT) Date: Wed, 6 Apr 2005 09:40:22 -0700 (PDT) From: Philip Hallstrom To: freebsd-questions@freebsd.org Message-ID: <20050406093528.I44943@wolf.pjkh.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: by amavisd-new at pjkh.com Subject: Recommended search engine for web pages and maybe email? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Apr 2005 16:40:27 -0000 Hi all - Recently I've found myself searching the freebsd ports web site quite frequently as well as some other online documentation (php, mysql, postgresql, freebsd faq/handbook) and it always bothers me because I know I can mirror that stuff and search it locally and in general cut down on their load. It's been a long time since I've setup any search engines/spiders to do this sort of thing. In the past I've used htdig and mnogosearch. I was hoping someone out there could tell me which one of those (or a third such as openfts?) I should install and get going to save me some time trying them all out. I think the only feature I really care about is being able to limit the search to a particular collection (freebsd ports, php manual, etc.) As an aside, I've got about 60mb (~ 5,000 messages) stored in pine's mbox format that I occasionally grep through, but would be nice if it was little more advanced... I've also thought maybe I should just host it on an external server, and only allow my hosts and google's indexer to crawl it and just rely on google. Suggestions? recommendations? Thanks! -philip