From owner-freebsd-www@FreeBSD.ORG Tue Oct 11 20:28:31 2005 Return-Path: X-Original-To: www@FreeBSD.org Delivered-To: freebsd-www@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ECCB116A41F for ; Tue, 11 Oct 2005 20:28:31 +0000 (GMT) (envelope-from wosch@FreeBSD.org) Received: from baerenklau.de.freebsd.org (baerenklau.de.freebsd.org [195.185.195.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 517F143D45 for ; Tue, 11 Oct 2005 20:28:31 +0000 (GMT) (envelope-from wosch@FreeBSD.org) Received: from baerenklau.de.freebsd.org (localhost [127.0.0.1]) by baerenklau.de.freebsd.org (8.12.11/8.12.9) with ESMTP id j9BKSRwZ090184; Tue, 11 Oct 2005 22:28:27 +0200 (CEST) (envelope-from wosch@FreeBSD.org) Received: (from uucp@localhost) by baerenklau.de.freebsd.org (8.12.11/8.12.9/Submit) with UUCP id j9BKSRqv090183; Tue, 11 Oct 2005 22:28:27 +0200 (CEST) (envelope-from wosch@FreeBSD.org) Received: from [192.168.0.104] (plum.panke.de.freebsd.org [192.168.0.104]) by paula.panke.de.freebsd.org (8.13.4/8.12.9) with ESMTP id j9BKLqmR015068; Tue, 11 Oct 2005 22:21:53 +0200 (CEST) (envelope-from wosch@FreeBSD.org) Message-ID: <434C1ECF.4090608@FreeBSD.org> Date: Tue, 11 Oct 2005 22:21:35 +0200 From: Wolfram Schneider User-Agent: Mozilla Thunderbird 1.0.7 (Macintosh/20050923) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Tim Wilde References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: www@FreeBSD.org Subject: Re: Using Yahoo! or Google search bar instead of search.cgi X-BeenThere: freebsd-www@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Project Webmasters List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Oct 2005 20:28:32 -0000 Tim Wilde wrote: > (Apologies for breaking threading, just joined freebsd-www so I don't > have the appropriate messages for a References: header.) > > As I mentioned in my earlier post, I think an even bigger problem than > the one Murray mentioned can be observed by the fact that a search for > "kernel" returns no results at all. I guess what happens here: "kernel" is a very common word (believe it or not). google has 18.900 hits for the word "kernel" on www.freebsd.org. Common words (e.g. "a", "the", "an", "www", "is") are usually ignored by search engines to save space or to speed up searches. These are known as "stop words." Even google has stop words. From my memory, search.cgi has a dynamic stop word list - words which hit the limit of 20.000 will be ignored. -Wolfram > At DynDNS, we recently started indexing our site using ht://Dig > (http://www.htdig.org/), and have been very happy with the flexibility > it provides for tuning search results to get the most relevant > matches. It is also a true spider, crawling the website over HTTP > rather than searching on the filesystem as the current search.cgi > seems to do. -- Wolfram Schneider http://wolfram.schneider.org