From owner-freebsd-current@FreeBSD.ORG  Sun Aug 22 16:30:08 2010
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F20A61065693;
	Sun, 22 Aug 2010 16:30:07 +0000 (UTC) (envelope-from scf@FreeBSD.org)
Received: from mail.farley.org (mail.farley.org [IPv6:2001:470:1f0f:20:2::11])
	by mx1.freebsd.org (Postfix) with ESMTP id 9E0E58FC0A;
	Sun, 22 Aug 2010 16:30:07 +0000 (UTC)
Received: from [192.168.1.201] ([192.168.1.201])
	by mail.farley.org (8.14.4/8.14.4) with ESMTP id o7MGU1s0063186;
	Sun, 22 Aug 2010 11:30:01 -0500 (CDT) (envelope-from scf@FreeBSD.org)
Date: Sun, 22 Aug 2010 11:30:01 -0500 (CDT)
From: "Sean C. Farley" <scf@FreeBSD.org>
To: =?ISO-8859-15?Q?Dag-Erling_Sm=F8rgrav?= <des@des.no>
In-Reply-To: <86k4nikglg.fsf@ds4.des.no>
Message-ID: <alpine.BSF.2.00.1008221111300.1989@terminus>
References: <201008210231.o7L2VRvI031700@ducky.net> <86k4nikglg.fsf@ds4.des.no>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED;
	BOUNDARY="4255886656-1237390696-1282494601=:1989"
X-Spam-Status: No, score=-2.9 required=4.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.farley.org
Cc: freebsd-current@FreeBSD.org, Mike Haertel <mike@ducky.net>,
	gabor@FreeBSD.org
Subject: Re: why GNU grep is fast
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 22 Aug 2010 16:30:08 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--4255886656-1237390696-1282494601=:1989
Content-Type: TEXT/PLAIN; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8BIT

On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote:

> Mike Haertel <mike@ducky.net> writes:
>> GNU grep uses the well-known Boyer-Moore algorithm, which looks first 
>> for the final letter of the target string, and uses a lookup table to 
>> tell it how far ahead it can skip in the input whenever it finds a 
>> non-matching character.
>
> Boyer-Moore is for fixed search strings.  I don't see how that 
> optimization can work with a regexp search unless the regexp is so 
> simple that you break it down into a small number of cases with known 
> length and final character.

When I was working on making FreeGrep faster (years ago), I wrote down a 
few notes about possible algorithms, especially those that could be 
useful for fgrep functionality.  I am just passing these onto the list.

Some algorithms:
1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm
2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm
3. GNU fgrep:  Commentz-Walter
4. GLIMPSE:  http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore variant)

Also, this may be a useful book:
http://www.dcc.uchile.cl/~gnavarro/FPMbook/

Sean
-- 
scf@FreeBSD.org
--4255886656-1237390696-1282494601=:1989--