Date: 26 Jun 2001 09:47:45 +0200 From: Dag-Erling Smorgrav <des@ofug.org> To: Ryan Thompson <ryan@sasknow.com> Cc: Anton Berezin <tobez@tobez.org>, freebsd-chat@FreeBSD.ORG Subject: Re: most complex code in BSD? Message-ID: <xzpzoavadi6.fsf@flood.ping.uio.no> In-Reply-To: <Pine.BSF.4.21.0106260042370.82644-100000@ren.sasknow.com> References: <Pine.BSF.4.21.0106260042370.82644-100000@ren.sasknow.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Ryan Thompson <ryan@sasknow.com> writes: > Classic problem with this is, even if you can understand it (or at least > figure it out in less than a minute or so) who the hell could hazard a > guess at the efficiency of that "algorithm"? (Before going to the trouble > of testing it on a few million lines). I bet it isn't O(n) ;-) I strongly suspect it's O(n*log(k)), where k is the number of distinct lines in the input. In most cases k will probably be a significant fraction of n, so call it O(n*log(n)). > Maybe you could guess, but you'd have to have a conventional algorithm in > head (or, one step further, in code), to recant the efficiency of a > well-written duplicate removal function. Then, you have to ask the > question... In terms of final algorithmic results, is 's,.,,sg if > $_{+lc}++' "well-written"? Or does it turn into some ghastly memory-greedy > exponential function? There are two ways you could improve it. The first is to replace s,.,,sg (which was picked for obscurity, not efficiency) with $_="", which is both quicker and shorter. The other (not possible in Perl) is to keep only the hash value of each previously seen line, and discard the actual line to conserve memory. With a good hash function, the likelihood of finding two distinct input lines that hash to the same value should be pretty low (though non-zero). DES -- Dag-Erling Smorgrav - des@ofug.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?xzpzoavadi6.fsf>