FreeBSD Mail Archives

Date:      Sun, 22 Aug 2010 22:44:13 +0200
From:      =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To:        Mike Haertel <mike@ducky.net>
Cc:        freebsd-current@freebsd.org, gabor@freebsd.org
Subject:   Re: why GNU grep is fast
Message-ID:  <86lj7yl6n6.fsf@ds4.des.no>
In-Reply-To: <201008221822.o7MIMCRN050408@ducky.net> (Mike Haertel's message of "Sun, 22 Aug 2010 11:22:12 -0700")
References:  <201008210231.o7L2VRvI031700@ducky.net> <86k4nikglg.fsf@ds4.des.no> <201008221822.o7MIMCRN050408@ducky.net>

index | next in thread | previous in thread | raw e-mail


Mike Haertel <mike@ducky.net> writes:
> For example if your regex is /foo.*bar/, the initial Boyer-Moore search
> is (probably) searching for foo.
>
> If the initial search succeeds, GNU grep isolates the containing line,
> and then runs the full regex matcher on that line to make sure.

You don't really need to "isolate the containing line" unless you have
an actual match, do you?  There are two cases:

 1) The regexp does not use any character classes, including /./, so the
    FSA will stop if it hits EOL before it reaches an accepting state.

 2) The regexp uses character classes, and you rewrite them to exclude
    \n: /[^bar]/ becomes /[^bar\n]/, /./ becomes /[^\n]/, etc., and the
    FSA will stop if it hits EOL before it reaches an accepting state.

DES
-- 
Dag-Erling Smørgrav - des@des.no

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86lj7yl6n6.fsf>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation