Date: Wed, 27 Jun 2001 20:58:35 +0200 From: Anton Berezin <tobez@tobez.org> To: Dag-Erling Smorgrav <des@ofug.org> Cc: Ryan Thompson <ryan@sasknow.com>, freebsd-chat@FreeBSD.ORG Subject: Re: most complex code in BSD? Message-ID: <20010627205835.A72506@heechee.tobez.org> In-Reply-To: <xzpzoavadi6.fsf@flood.ping.uio.no>; from des@ofug.org on Tue, Jun 26, 2001 at 09:47:45AM %2B0200 References: <Pine.BSF.4.21.0106260042370.82644-100000@ren.sasknow.com> <xzpzoavadi6.fsf@flood.ping.uio.no>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jun 26, 2001 at 09:47:45AM +0200, Dag-Erling Smorgrav wrote: > Ryan Thompson <ryan@sasknow.com> writes: > > Classic problem with this is, even if you can understand it (or at > > least figure it out in less than a minute or so) who the hell could > > hazard a guess at the efficiency of that "algorithm"? (Before going > > to the trouble of testing it on a few million lines). I bet it isn't > > O(n) ;-) > > I strongly suspect it's O(n*log(k)), where k is the number of distinct > lines in the input. In most cases k will probably be a significant > fraction of n, so call it O(n*log(n)). I do think that in fact it is closer to O(n), if one replaces s,.,,sg with $_="", which is the obvious optimization, and which you already mentioned. The hash accesses are essentially O(1) for random enough keys and in case there are enough buckets - and there is always enough buckets in Perl since it rebalances its hashes when necessary. So it should be close to O(n). > The other (not possible in Perl) is to keep only the hash value of > each previously seen line, and discard the actual line to conserve > memory. With a good hash function, the likelihood of finding two > distinct input lines that hash to the same value should be pretty low > (though non-zero). This does not improve the time characteristics of the algorithm. It saves memory, though. \Anton. -- May the tuna salad be with you. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010627205835.A72506>