From owner-freebsd-hackers Fri Jul 30 15:35:13 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from smtp11.bellglobal.com (smtp11.bellglobal.com [204.101.251.53]) by hub.freebsd.org (Postfix) with ESMTP id CD5F915733 for ; Fri, 30 Jul 1999 15:35:04 -0700 (PDT) (envelope-from vanderh@ecf.toronto.edu) Received: from localhost.nowhere (ppp18370.on.bellglobal.com [206.172.130.50]) by smtp11.bellglobal.com (8.8.5/8.8.5) with ESMTP id SAA03257; Fri, 30 Jul 1999 18:38:02 -0400 (EDT) Received: (from tim@localhost) by localhost.nowhere (8.9.3/8.9.1) id SAA67239; Fri, 30 Jul 1999 18:28:38 -0400 (EDT) (envelope-from tim) Date: Fri, 30 Jul 1999 18:28:38 -0400 From: Tim Vanderhoek To: "Daniel C. Sobral" Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: replacing grep(1) Message-ID: <19990730182838.B67094@mad> References: <379F3701.F79E10DF@newsguy.com> <19990728191531.A318@mad> <37A04635.59E74FF6@newsguy.com> <19990729182229.E24296@mad> <37A1AF27.C5B9DB7D@newsguy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95i In-Reply-To: <37A1AF27.C5B9DB7D@newsguy.com>; from Daniel C. Sobral on Fri, Jul 30, 1999 at 10:56:55PM +0900 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, Jul 30, 1999 at 10:56:55PM +0900, Daniel C. Sobral wrote: > > I said that I did not care whether the thing is inside or outside > the regexp library. Yes, although I think at this point it's obvious we're coming at this discussion from fairly different perspectives. By the time you brought-up complexity originally, I had more or less decided that I did not want to see the new grep imported without significant speed improvements and was concerned with how to improve grep. Your interest is in debating that point (fortunately arguing for the side I agree with :). > 4) grep -e 123 456 world.build [I assume "grep -e 123 -e 124 world.build"] > One can clearly see that GNU grep has a much better complexity in > the cases of longer patterns or multiple patterns with common > prefix. Alright, someone else already mentioned to me in email that I totally ignored what differences involved multiple patterns. Combining multiple patterns is a big win if those two patterns have a common prefix (I hadn't considered the case of similar patterns before, actually). Combining multiple patterns when they're dissimilar doesn't appear to help much (which is the only case I had considered -- my mistake, and also the reason I ignored what you said about multiple patterns). I'm surprised by the way GNU grep is able to handle longer patterns, and I probably wouldn't have noticed it unless I'd taken some time to examine the GNU source. Congratulations, you win. :) The rest of your lengthy message mostly goes on to repeat the fact that GNU grep is able to merge multiple patterns with a common prefix (and postfix?) to good effect. > It also shows that the new grep spends a lot of time in an activity > not related to the search itself, since it does multiple patterns by Well, duh. This is really why my reaction to "complexity analysis" is (still) what it is. Complexity analysis is almost only useful for comparing two different algorithms and the fact that the new grep spends a lot of time doing things other than pattern searching is quite obvious after a casual perusal of the source. Complexity analysis does not (directly) help improving an algorithm. With the possible exception of the idea of merging common prefixes, most of this is not useful (at this stage) to improving grep. If I was going to propose replacing the existing GNU grep, I would (and always would have) done considerable more speed trials than the simple one in my last message. > It would seem that GNU grep is superior in the case of partial > matches without a full match too, but the standard deviation for the That is almost certainly something inside the regex library, which I have repeatedly said I'm not interested in even looking at. If our regex library is too slow, then we need to look into the newer one the Henry Spencer is rumoured to be sitting on. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message