Skip site navigation (1)Skip section navigation (2)
Date:      26 Jun 2001 09:47:45 +0200
From:      Dag-Erling Smorgrav <des@ofug.org>
To:        Ryan Thompson <ryan@sasknow.com>
Cc:        Anton Berezin <tobez@tobez.org>, freebsd-chat@FreeBSD.ORG
Subject:   Re: most complex code in BSD?
Message-ID:  <xzpzoavadi6.fsf@flood.ping.uio.no>
In-Reply-To: <Pine.BSF.4.21.0106260042370.82644-100000@ren.sasknow.com>
References:  <Pine.BSF.4.21.0106260042370.82644-100000@ren.sasknow.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Ryan Thompson <ryan@sasknow.com> writes:
> Classic problem with this is, even if you can understand it (or at least
> figure it out in less than a minute or so) who the hell could hazard a
> guess at the efficiency of that "algorithm"? (Before going to the trouble
> of testing it on a few million lines). I bet it isn't O(n) ;-)

I strongly suspect it's O(n*log(k)), where k is the number of distinct
lines in the input.  In most cases k will probably be a significant
fraction of n, so call it O(n*log(n)).

> Maybe you could guess, but you'd have to have a conventional algorithm in
> head (or, one step further, in code), to recant the efficiency of a
> well-written duplicate removal function. Then, you have to ask the
> question... In terms of final algorithmic results, is 's,.,,sg if
> $_{+lc}++' "well-written"? Or does it turn into some ghastly memory-greedy
> exponential function?

There are two ways you could improve it.  The first is to replace
s,.,,sg (which was picked for obscurity, not efficiency) with $_="",
which is both quicker and shorter.  The other (not possible in Perl)
is to keep only the hash value of each previously seen line, and
discard the actual line to conserve memory.  With a good hash
function, the likelihood of finding two distinct input lines that hash
to the same value should be pretty low (though non-zero).

DES
-- 
Dag-Erling Smorgrav - des@ofug.org

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?xzpzoavadi6.fsf>