FreeBSD Mail Archives

Date:      Thu, 29 Jul 1999 18:22:29 -0400
From:      Tim Vanderhoek <vanderh@ecf.utoronto.ca>
To:        "Daniel C. Sobral" <dcs@newsguy.com>
Cc:        James Howard <howardjp@wam.umd.edu>, freebsd-hackers@FreeBSD.ORG
Subject:   Re: replacing grep(1)
Message-ID:  <19990729182229.E24296@mad>
In-Reply-To: <37A04635.59E74FF6@newsguy.com>; from Daniel C. Sobral on Thu, Jul 29, 1999 at 09:16:53PM %2B0900
References:  <Pine.GSO.4.10.9907272344440.19477-100000@rac9.wam.umd.edu> <379F3701.F79E10DF@newsguy.com> <19990728191531.A318@mad> <37A04635.59E74FF6@newsguy.com>

index | next in thread | previous in thread | raw e-mail

On Thu, Jul 29, 1999 at 09:16:53PM +0900, Daniel C. Sobral wrote:
> > >
> > > Sorry, but a simplistic analysis like that just won't cut for grep.
> > > The algorithmic complexity is highly relevant here. Try this:
> > 
> > Algorithmic complexity!?!
> 
> Yup.

I'm sorry.  I've read your message and have decided that you're wrong.
Outside of the regexp library, algorithmic complexity is not a factor
here.  It would take a beanbag to write anything other than an O(N)
algorithm.

The proposed grep is slow, very slow, and I've sent a long message to
James outlining how to make it much faster, but algorithmic complexity
is not an issue.

> Also, fgetln() will copy the line buffer from time to time, though
> that's not a simple computation, and probably of little

fgetln() does a complete copy of the line buffer whenever an
excessively long line is found.  On this point, it's hard to do better
without using mmap(), but mmap() has its own disadvantages.  My last
suggestion to James was to assume a worst case for long lines and mark
the worst worst case with an XXX "this is unfortunate".

> > The test you suggested doesn't show anything about that algorithmic
> > complexity, though.
> 
> Yeah? Try to back that with the results of the tests I suggested.

No, it's not even worth my time.

Now look.  You've gotten me so upset I actually went and did a simple
test.  The test showed I'm right and you're wrong.  Catting X number
of copies of /etc/termcap into longfile causes the time grep uses
to pass longfile searching for all occurrences of "printer" causes
it to use an extra 0.03 seconds for every repetition of /etc/termcap
in longfile.

Gee, linear complexity wrt to file length.  Who could've guessed!?

What'ya bet GNU grep also exhibits linear complexity?  :)

Admit it, you jumped in with some bullshit about complexity when had
you taken the time to look into what James meant when he said "it now
spends 50% of its time in procline()" you would have kept quiet,
realizing that he was talking about a constant factor in the
complexity analysis, an subject where comments such as "it now spends
50% of its time in procline()" are relevent.

:-)

[Never mind that it should be spending near 100% of its time in
 procline...that just means he's still got work to do... :-]

-- 
This is my .signature which gets appended to the end of my messages.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990729182229.E24296>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation