From owner-freebsd-hackers  Wed Jul 28 11: 3:14 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from rothko.bestweb.net (rothko.bestweb.net [209.94.100.160])
	by hub.freebsd.org (Postfix) with ESMTP id E9A4D14C3F
	for <freebsd-hackers@FreeBSD.ORG>; Wed, 28 Jul 1999 11:03:08 -0700 (PDT)
	(envelope-from mark@suarez.bestweb.net)
Received: from kandinsky (kandinsky.bestweb.net [209.94.100.35])
	by rothko.bestweb.net (8.9.1a/8.9.0) with SMTP id NAA12850
	for <freebsd-hackers@FreeBSD.ORG>; Wed, 28 Jul 1999 13:39:53 -0400 (EDT)
Message-ID: <012801bed920$99256c20$23645ed1@bestweb.net>
Reply-To: "Mark Dickey" <mark@suarez.bestweb.net>
From: "Mark Dickey" <mark@suarez.bestweb.net>
To: <freebsd-hackers@FreeBSD.ORG>
References: <Pine.GSO.4.10.9907272344440.19477-100000@rac9.wam.umd.edu> <379F3701.F79E10DF@newsguy.com>
Subject: Re: replacing grep(1)
Date: Wed, 28 Jul 1999 13:42:43 -0400
Organization: BestWeb Corporation
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

I expect that there is a very good reason why this shouldn't be done,
but could it be possible to implement two different algorithms/code
dependant on the size of the file being grepped?

Mark Dickey
mark@bestweb.net

Daniel C. Sobral wrote:
> James Howard wrote:
> >
> > Due to the discussion of speed, I have been looking at it and it is
really
> > slow.  Even slower than I thought and I was thinking it was pretty slow.
> >
> > So using gprof, I have discovered that it seems to spend a whole mess of
> > time in grep_malloc() and free().  So I pulled all the references to
> > malloc inside the main loop (the copy for ln.dat and removed queueing).
> > This stills leaves us with a grep that is about ~6x slower than GNU.
> > Before that, it ran closer to 80x.  After this, gprof says it spends
> > around 53% of its time in procline().
>
> Sorry, but a simplistic analysis like that just won't cut for grep.
> The algorithmic complexity is highly relevant here. Try this:
> generate a 1 Mb file, and then generate 10 Mb and 50 Mb files by
> concatenating that first file. Benchmark yours and GNU grep a number
> of times to get the average for each file. Now compare the
> *proportions* between the different sized files. Are they the same?
>
> Next, try different sized patterns on the 50 Mb file on both yours
> and GNU grep. Again, compare the proportion.
>
> Next, compare patterns with different number of "wildcards",
> patterns with things like [acegikmoqsuvxz] vs
> [acegikmoqsuvxzACEGIKMOQSUVXZ], etc.
>
> Either that, or do a complexity analysis of the algorithms. :-)
>
> (In case anyone reading this discussion wants to know more about
> complexity of algorithms, I recommend Computer Algorithms,
> Introduction to Design and Analysis, by Sara Baase, Addison Wesley.)
>
> --
> Daniel C. Sobral (8-DCS)
> dcs@newsguy.com
> dcs@freebsd.org
>
> "Is it true that you're a millionaire's son who never worked a day
> in your life?"
> "Yeah, I guess so."
> "Lemme tell you, son, you ain't missed a thing."
>
>
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message