From owner-freebsd-hackers Wed Jul 28 10: 3:29 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from peach.ocn.ne.jp (peach.ocn.ne.jp [210.145.254.87]) by hub.freebsd.org (Postfix) with ESMTP id B608714D70 for ; Wed, 28 Jul 1999 10:03:26 -0700 (PDT) (envelope-from dcs@newsguy.com) Received: from newsguy.com by peach.ocn.ne.jp (8.9.1a/OCN) id CAA26170; Thu, 29 Jul 1999 02:03:01 +0900 (JST) Message-ID: <379F3701.F79E10DF@newsguy.com> Date: Thu, 29 Jul 1999 01:59:45 +0900 From: "Daniel C. Sobral" X-Mailer: Mozilla 4.6 [en] (Win98; I) X-Accept-Language: en,pt-BR,ja MIME-Version: 1.0 To: James Howard Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: replacing grep(1) References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG James Howard wrote: > > Due to the discussion of speed, I have been looking at it and it is really > slow. Even slower than I thought and I was thinking it was pretty slow. > > So using gprof, I have discovered that it seems to spend a whole mess of > time in grep_malloc() and free(). So I pulled all the references to > malloc inside the main loop (the copy for ln.dat and removed queueing). > This stills leaves us with a grep that is about ~6x slower than GNU. > Before that, it ran closer to 80x. After this, gprof says it spends > around 53% of its time in procline(). Sorry, but a simplistic analysis like that just won't cut for grep. The algorithmic complexity is highly relevant here. Try this: generate a 1 Mb file, and then generate 10 Mb and 50 Mb files by concatenating that first file. Benchmark yours and GNU grep a number of times to get the average for each file. Now compare the *proportions* between the different sized files. Are they the same? Next, try different sized patterns on the 50 Mb file on both yours and GNU grep. Again, compare the proportion. Next, compare patterns with different number of "wildcards", patterns with things like [acegikmoqsuvxz] vs [acegikmoqsuvxzACEGIKMOQSUVXZ], etc. Either that, or do a complexity analysis of the algorithms. :-) (In case anyone reading this discussion wants to know more about complexity of algorithms, I recommend Computer Algorithms, Introduction to Design and Analysis, by Sara Baase, Addison Wesley.) -- Daniel C. Sobral (8-DCS) dcs@newsguy.com dcs@freebsd.org "Is it true that you're a millionaire's son who never worked a day in your life?" "Yeah, I guess so." "Lemme tell you, son, you ain't missed a thing." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message