FreeBSD Mail Archives

Date:      Fri, 30 Jul 1999 18:28:38 -0400
From:      Tim Vanderhoek <vanderh@ecf.utoronto.ca>
To:        "Daniel C. Sobral" <dcs@newsguy.com>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: replacing grep(1)
Message-ID:  <19990730182838.B67094@mad>
In-Reply-To: <37A1AF27.C5B9DB7D@newsguy.com>; from Daniel C. Sobral on Fri, Jul 30, 1999 at 10:56:55PM %2B0900
References:  <Pine.GSO.4.10.9907272344440.19477-100000@rac9.wam.umd.edu> <379F3701.F79E10DF@newsguy.com> <19990728191531.A318@mad> <37A04635.59E74FF6@newsguy.com> <19990729182229.E24296@mad> <37A1AF27.C5B9DB7D@newsguy.com>

On Fri, Jul 30, 1999 at 10:56:55PM +0900, Daniel C. Sobral wrote:
> 
> I said that I did not care whether the thing is inside or outside
> the regexp library.

Yes, although I think at this point it's obvious we're coming at this
discussion from fairly different perspectives.  By the time you
brought-up complexity originally, I had more or less decided that I
did not want to see the new grep imported without significant speed
improvements and was concerned with how to improve grep.  Your
interest is in debating that point (fortunately arguing for the
side I agree with :).

> 4) grep -e 123 456 world.build

[I assume "grep -e 123 -e 124 world.build"]

> One can clearly see that GNU grep has a much better complexity in
> the cases of longer patterns or multiple patterns with common
> prefix.

Alright, someone else already mentioned to me in email that I
totally ignored what differences involved multiple patterns.
Combining multiple patterns is a big win if those two patterns have
a common prefix (I hadn't considered the case of similar patterns
before, actually).  Combining multiple patterns when they're
dissimilar doesn't appear to help much (which is the only case I had
considered -- my mistake, and also the reason I ignored what you
said about multiple patterns).

I'm surprised by the way GNU grep is able to handle longer patterns,
and I probably wouldn't have noticed it unless I'd taken some time to
examine the GNU source.

Congratulations, you win.  :)  The rest of your lengthy message mostly
goes on to repeat the fact that GNU grep is able to merge multiple
patterns with a common prefix (and postfix?) to good effect.

> It also shows that the new grep spends a lot of time in an activity
> not related to the search itself, since it does multiple patterns by

Well, duh.  This is really why my reaction to "complexity analysis" is
(still) what it is.  Complexity analysis is almost only useful for
comparing two different algorithms and the fact that the new grep
spends a lot of time doing things other than pattern searching is
quite obvious after a casual perusal of the source.  Complexity
analysis does not (directly) help improving an algorithm.  With the
possible exception of the idea of merging common prefixes, most of
this is not useful (at this stage) to improving grep.

If I was going to propose replacing the existing GNU grep, I would
(and always would have) done considerable more speed trials than the
simple one in my last message.

> It would seem that GNU grep is superior in the case of partial
> matches without a full match too, but the standard deviation for the

That is almost certainly something inside the regex library, which I
have repeatedly said I'm not interested in even looking at.  If our
regex library is too slow, then we need to look into the newer one the
Henry Spencer is rumoured to be sitting on.

-- 
This is my .signature which gets appended to the end of my messages.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990730182838.B67094>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation