From owner-freebsd-current@FreeBSD.ORG  Sun Aug 22 17:31:50 2010
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 559861065698
	for <current@freebsd.org>; Sun, 22 Aug 2010 17:31:50 +0000 (UTC)
	(envelope-from tim@kientzle.com)
Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com
	[209.85.160.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 3377D8FC15
	for <current@freebsd.org>; Sun, 22 Aug 2010 17:31:49 +0000 (UTC)
Received: by pwi8 with SMTP id 8so450425pwi.13
	for <current@freebsd.org>; Sun, 22 Aug 2010 10:31:49 -0700 (PDT)
Received: by 10.143.9.6 with SMTP id m6mr3184455wfi.310.1282498309659;
	Sun, 22 Aug 2010 10:31:49 -0700 (PDT)
Received: from [10.123.2.181] (99-74-170-25.lightspeed.sntcca.sbcglobal.net
	[99.74.170.25])
	by mx.google.com with ESMTPS id v38sm7179056wfh.0.2010.08.22.10.31.47
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Sun, 22 Aug 2010 10:31:48 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset=us-ascii
From: Tim Kientzle <tim@kientzle.com>
In-Reply-To: <201008221502.o7MF2bfZ086738@hergotha.csail.mit.edu>
Date: Sun, 22 Aug 2010 10:31:46 -0700
Content-Transfer-Encoding: 7bit
Message-Id: <E99FFB92-DB0B-47F0-9918-5DA52B34C10D@kientzle.com>
References: <201008210231.o7L2VRvI031700@ducky.net> <86k4nikglg.fsf@ds4.des.no>
	<201008221502.o7MF2bfZ086738@hergotha.csail.mit.edu>
To: Garrett Wollman <wollman@hergotha.csail.mit.edu>
X-Mailer: Apple Mail (2.1081)
Cc: current@freebsd.org
Subject: Re: why GNU grep is fast
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 22 Aug 2010 17:31:50 -0000

On Aug 22, 2010, at 8:02 AM, Garrett Wollman wrote:
> In article <86k4nikglg.fsf@ds4.des.no> you write:
>> Mike Haertel <mike@ducky.net> writes:
>>> GNU grep uses the well-known Boyer-Moore algorithm, which looks
>>> first for the final letter of the target string, and uses a lookup
>>> table to tell it how far ahead it can skip in the input whenever
>>> it finds a non-matching character.
>> 
>> Boyer-Moore is for fixed search strings.  I don't see how that
>> optimization can work with a regexp search unless the regexp is so
>> simple that you break it down into a small number of cases with known
>> length and final character.
> 
> The common case of regexps used in the "grep" utility (and, for
> obvious reasons, universal in the "fgrep" utility) is fixed-length
> search strings.  Even non-fixed-length regexps typically consist of
> one one or two variable-length parts.

This is an important point:  A good grep implementation will
use different strategies depending on the input regexp.
Fixed-string matching is a very important special case.

>  Matching a completely
> variable-length regexp is just hard, computationally, 

See Russ Cox' articles for why this is not true.  It does require
considerable sophistication to build an efficient DFA but the
actual matcher, once built, can run very fast indeed:

  http://swtch.com/~rsc/regexp/

Tim