From owner-freebsd-chat Wed Jun 27 12: 7: 6 2001 Delivered-To: freebsd-chat@freebsd.org Received: from guru.mired.org (okc-27-141-144.mmcable.com [24.27.141.144]) by hub.freebsd.org (Postfix) with SMTP id EED1537B406 for ; Wed, 27 Jun 2001 12:07:03 -0700 (PDT) (envelope-from mwm@mired.org) Received: (qmail 114 invoked by uid 100); 27 Jun 2001 19:06:57 -0000 From: Mike Meyer MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15162.11985.348196.939734@guru.mired.org> Date: Wed, 27 Jun 2001 14:06:57 -0500 To: Anton Berezin Cc: Dag-Erling Smorgrav , Ryan Thompson , freebsd-chat@FreeBSD.ORG Subject: Re: most complex code in BSD? In-Reply-To: <20010627205835.A72506@heechee.tobez.org> References: <20010627205835.A72506@heechee.tobez.org> X-Mailer: VM 6.90 under 21.1 (patch 14) "Cuyahoga Valley" XEmacs Lucid X-face: "5Mnwy%?j>IIV\)A=):rjWL~NB2aH[}Yq8Z=u~vJ`"(,&SiLvbbz2W`;h9L,Yg`+vb1>RG% *h+%X^n0EZd>TM8_IB;a8F?(Fb"lw'IgCoyM.[Lg#r\ Sender: owner-freebsd-chat@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Anton Berezin types: > On Tue, Jun 26, 2001 at 09:47:45AM +0200, Dag-Erling Smorgrav wrote: > > Ryan Thompson writes: > > > Classic problem with this is, even if you can understand it (or at > > > least figure it out in less than a minute or so) who the hell could > > > hazard a guess at the efficiency of that "algorithm"? (Before going > > > to the trouble of testing it on a few million lines). I bet it isn't > > > O(n) ;-) > > I strongly suspect it's O(n*log(k)), where k is the number of distinct > > lines in the input. In most cases k will probably be a significant > > fraction of n, so call it O(n*log(n)). > > I do think that in fact it is closer to O(n), if one replaces s,.,,sg > with $_="", which is the obvious optimization, and which you already > mentioned. The hash accesses are essentially O(1) for random enough > keys and in case there are enough buckets - and there is always enough > buckets in Perl since it rebalances its hashes when necessary. So it > should be close to O(n). If I understand the problem correctly, the deviation from O(n) will be the cost of rebalancing the hashes. Unless that's O(n) or less, Ryan's correct. http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message