From owner-freebsd-current@FreeBSD.ORG  Sun Aug 22 11:54:37 2010
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0F51810656AD;
	Sun, 22 Aug 2010 11:54:37 +0000 (UTC) (envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id C2C848FC16;
	Sun, 22 Aug 2010 11:54:36 +0000 (UTC)
Received: from ds4.des.no (des.no [84.49.246.2])
	by smtp.des.no (Postfix) with ESMTP id B17F61FFC37;
	Sun, 22 Aug 2010 11:54:35 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 8A3E4844B1; Sun, 22 Aug 2010 13:54:35 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Mike Haertel <mike@ducky.net>
References: <201008210231.o7L2VRvI031700@ducky.net>
Date: Sun, 22 Aug 2010 13:54:35 +0200
In-Reply-To: <201008210231.o7L2VRvI031700@ducky.net> (Mike Haertel's message
	of "Fri, 20 Aug 2010 19:31:27 -0700")
Message-ID: <86k4nikglg.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-current@freebsd.org, gabor@freebsd.org
Subject: Re: why GNU grep is fast
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 22 Aug 2010 11:54:40 -0000

Mike Haertel <mike@ducky.net> writes:
> GNU grep uses the well-known Boyer-Moore algorithm, which looks
> first for the final letter of the target string, and uses a lookup
> table to tell it how far ahead it can skip in the input whenever
> it finds a non-matching character.

Boyer-Moore is for fixed search strings.  I don't see how that
optimization can work with a regexp search unless the regexp is so
simple that you break it down into a small number of cases with known
length and final character.

> GNU grep uses raw Unix input system calls and avoids copying data
> after reading it.

Yes, that was the first thing we looked at (and fixed) in BSD grep.

> Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES.  Looking
> for newlines would slow grep down by a factor of several times,
> because to find the newlines it would have to look at every byte!

Amen.  The current bottleneck in BSD grep is the memchr() that looks for
'\n' in the input buffer.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no