From owner-freebsd-hackers Wed Jul 28 11: 3:14 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from rothko.bestweb.net (rothko.bestweb.net [209.94.100.160]) by hub.freebsd.org (Postfix) with ESMTP id E9A4D14C3F for ; Wed, 28 Jul 1999 11:03:08 -0700 (PDT) (envelope-from mark@suarez.bestweb.net) Received: from kandinsky (kandinsky.bestweb.net [209.94.100.35]) by rothko.bestweb.net (8.9.1a/8.9.0) with SMTP id NAA12850 for ; Wed, 28 Jul 1999 13:39:53 -0400 (EDT) Message-ID: <012801bed920$99256c20$23645ed1@bestweb.net> Reply-To: "Mark Dickey" From: "Mark Dickey" To: References: <379F3701.F79E10DF@newsguy.com> Subject: Re: replacing grep(1) Date: Wed, 28 Jul 1999 13:42:43 -0400 Organization: BestWeb Corporation MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2314.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I expect that there is a very good reason why this shouldn't be done, but could it be possible to implement two different algorithms/code dependant on the size of the file being grepped? Mark Dickey mark@bestweb.net Daniel C. Sobral wrote: > James Howard wrote: > > > > Due to the discussion of speed, I have been looking at it and it is really > > slow. Even slower than I thought and I was thinking it was pretty slow. > > > > So using gprof, I have discovered that it seems to spend a whole mess of > > time in grep_malloc() and free(). So I pulled all the references to > > malloc inside the main loop (the copy for ln.dat and removed queueing). > > This stills leaves us with a grep that is about ~6x slower than GNU. > > Before that, it ran closer to 80x. After this, gprof says it spends > > around 53% of its time in procline(). > > Sorry, but a simplistic analysis like that just won't cut for grep. > The algorithmic complexity is highly relevant here. Try this: > generate a 1 Mb file, and then generate 10 Mb and 50 Mb files by > concatenating that first file. Benchmark yours and GNU grep a number > of times to get the average for each file. Now compare the > *proportions* between the different sized files. Are they the same? > > Next, try different sized patterns on the 50 Mb file on both yours > and GNU grep. Again, compare the proportion. > > Next, compare patterns with different number of "wildcards", > patterns with things like [acegikmoqsuvxz] vs > [acegikmoqsuvxzACEGIKMOQSUVXZ], etc. > > Either that, or do a complexity analysis of the algorithms. :-) > > (In case anyone reading this discussion wants to know more about > complexity of algorithms, I recommend Computer Algorithms, > Introduction to Design and Analysis, by Sara Baase, Addison Wesley.) > > -- > Daniel C. Sobral (8-DCS) > dcs@newsguy.com > dcs@freebsd.org > > "Is it true that you're a millionaire's son who never worked a day > in your life?" > "Yeah, I guess so." > "Lemme tell you, son, you ain't missed a thing." > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message