Date: Sun, 22 Aug 2010 10:31:46 -0700 From: Tim Kientzle <tim@kientzle.com> To: Garrett Wollman <wollman@hergotha.csail.mit.edu> Cc: current@freebsd.org Subject: Re: why GNU grep is fast Message-ID: <E99FFB92-DB0B-47F0-9918-5DA52B34C10D@kientzle.com> In-Reply-To: <201008221502.o7MF2bfZ086738@hergotha.csail.mit.edu> References: <201008210231.o7L2VRvI031700@ducky.net> <86k4nikglg.fsf@ds4.des.no> <201008221502.o7MF2bfZ086738@hergotha.csail.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 22, 2010, at 8:02 AM, Garrett Wollman wrote: > In article <86k4nikglg.fsf@ds4.des.no> you write: >> Mike Haertel <mike@ducky.net> writes: >>> GNU grep uses the well-known Boyer-Moore algorithm, which looks >>> first for the final letter of the target string, and uses a lookup >>> table to tell it how far ahead it can skip in the input whenever >>> it finds a non-matching character. >> >> Boyer-Moore is for fixed search strings. I don't see how that >> optimization can work with a regexp search unless the regexp is so >> simple that you break it down into a small number of cases with known >> length and final character. > > The common case of regexps used in the "grep" utility (and, for > obvious reasons, universal in the "fgrep" utility) is fixed-length > search strings. Even non-fixed-length regexps typically consist of > one one or two variable-length parts. This is an important point: A good grep implementation will use different strategies depending on the input regexp. Fixed-string matching is a very important special case. > Matching a completely > variable-length regexp is just hard, computationally, See Russ Cox' articles for why this is not true. It does require considerable sophistication to build an efficient DFA but the actual matcher, once built, can run very fast indeed: http://swtch.com/~rsc/regexp/ Tim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E99FFB92-DB0B-47F0-9918-5DA52B34C10D>