From owner-freebsd-hackers Thu Jul 29 15:25: 4 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from smtp13.bellglobal.com (smtp13.bellglobal.com [204.101.251.52]) by hub.freebsd.org (Postfix) with ESMTP id A7C13150F3 for ; Thu, 29 Jul 1999 15:24:56 -0700 (PDT) (envelope-from vanderh@ecf.toronto.edu) Received: from localhost.nowhere (Hamilton-ppp44858.sympatico.ca [206.172.76.51]) by smtp13.bellglobal.com (8.8.5/8.8.5) with ESMTP id SAA07041; Thu, 29 Jul 1999 18:26:00 -0400 (EDT) Received: (from tim@localhost) by localhost.nowhere (8.9.3/8.9.1) id SAA24675; Thu, 29 Jul 1999 18:22:29 -0400 (EDT) (envelope-from tim) Date: Thu, 29 Jul 1999 18:22:29 -0400 From: Tim Vanderhoek To: "Daniel C. Sobral" Cc: James Howard , freebsd-hackers@FreeBSD.ORG Subject: Re: replacing grep(1) Message-ID: <19990729182229.E24296@mad> References: <379F3701.F79E10DF@newsguy.com> <19990728191531.A318@mad> <37A04635.59E74FF6@newsguy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95i In-Reply-To: <37A04635.59E74FF6@newsguy.com>; from Daniel C. Sobral on Thu, Jul 29, 1999 at 09:16:53PM +0900 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, Jul 29, 1999 at 09:16:53PM +0900, Daniel C. Sobral wrote: > > > > > > Sorry, but a simplistic analysis like that just won't cut for grep. > > > The algorithmic complexity is highly relevant here. Try this: > > > > Algorithmic complexity!?! > > Yup. I'm sorry. I've read your message and have decided that you're wrong. Outside of the regexp library, algorithmic complexity is not a factor here. It would take a beanbag to write anything other than an O(N) algorithm. The proposed grep is slow, very slow, and I've sent a long message to James outlining how to make it much faster, but algorithmic complexity is not an issue. > Also, fgetln() will copy the line buffer from time to time, though > that's not a simple computation, and probably of little fgetln() does a complete copy of the line buffer whenever an excessively long line is found. On this point, it's hard to do better without using mmap(), but mmap() has its own disadvantages. My last suggestion to James was to assume a worst case for long lines and mark the worst worst case with an XXX "this is unfortunate". > > The test you suggested doesn't show anything about that algorithmic > > complexity, though. > > Yeah? Try to back that with the results of the tests I suggested. No, it's not even worth my time. Now look. You've gotten me so upset I actually went and did a simple test. The test showed I'm right and you're wrong. Catting X number of copies of /etc/termcap into longfile causes the time grep uses to pass longfile searching for all occurrences of "printer" causes it to use an extra 0.03 seconds for every repetition of /etc/termcap in longfile. Gee, linear complexity wrt to file length. Who could've guessed!? What'ya bet GNU grep also exhibits linear complexity? :) Admit it, you jumped in with some bullshit about complexity when had you taken the time to look into what James meant when he said "it now spends 50% of its time in procline()" you would have kept quiet, realizing that he was talking about a constant factor in the complexity analysis, an subject where comments such as "it now spends 50% of its time in procline()" are relevent. :-) [Never mind that it should be spending near 100% of its time in procline...that just means he's still got work to do... :-] -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message