Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Aug 2010 13:54:35 +0200
From:      =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To:        Mike Haertel <mike@ducky.net>
Cc:        freebsd-current@freebsd.org, gabor@freebsd.org
Subject:   Re: why GNU grep is fast
Message-ID:  <86k4nikglg.fsf@ds4.des.no>
In-Reply-To: <201008210231.o7L2VRvI031700@ducky.net> (Mike Haertel's message of "Fri, 20 Aug 2010 19:31:27 -0700")
References:  <201008210231.o7L2VRvI031700@ducky.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Mike Haertel <mike@ducky.net> writes:
> GNU grep uses the well-known Boyer-Moore algorithm, which looks
> first for the final letter of the target string, and uses a lookup
> table to tell it how far ahead it can skip in the input whenever
> it finds a non-matching character.

Boyer-Moore is for fixed search strings.  I don't see how that
optimization can work with a regexp search unless the regexp is so
simple that you break it down into a small number of cases with known
length and final character.

> GNU grep uses raw Unix input system calls and avoids copying data
> after reading it.

Yes, that was the first thing we looked at (and fixed) in BSD grep.

> Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES.  Looking
> for newlines would slow grep down by a factor of several times,
> because to find the newlines it would have to look at every byte!

Amen.  The current bottleneck in BSD grep is the memchr() that looks for
'\n' in the input buffer.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86k4nikglg.fsf>