Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Jun 2001 20:58:35 +0200
From:      Anton Berezin <tobez@tobez.org>
To:        Dag-Erling Smorgrav <des@ofug.org>
Cc:        Ryan Thompson <ryan@sasknow.com>, freebsd-chat@FreeBSD.ORG
Subject:   Re: most complex code in BSD?
Message-ID:  <20010627205835.A72506@heechee.tobez.org>
In-Reply-To: <xzpzoavadi6.fsf@flood.ping.uio.no>; from des@ofug.org on Tue, Jun 26, 2001 at 09:47:45AM %2B0200
References:  <Pine.BSF.4.21.0106260042370.82644-100000@ren.sasknow.com> <xzpzoavadi6.fsf@flood.ping.uio.no>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jun 26, 2001 at 09:47:45AM +0200, Dag-Erling Smorgrav wrote:
> Ryan Thompson <ryan@sasknow.com> writes:
> > Classic problem with this is, even if you can understand it (or at
> > least figure it out in less than a minute or so) who the hell could
> > hazard a guess at the efficiency of that "algorithm"? (Before going
> > to the trouble of testing it on a few million lines). I bet it isn't
> > O(n) ;-)
> 
> I strongly suspect it's O(n*log(k)), where k is the number of distinct
> lines in the input.  In most cases k will probably be a significant
> fraction of n, so call it O(n*log(n)).

I do think that in fact it is closer to O(n), if one replaces s,.,,sg
with $_="", which is the obvious optimization, and which you already
mentioned.  The hash accesses are essentially O(1) for random enough
keys and in case there are enough buckets - and there is always enough
buckets in Perl since it rebalances its hashes when necessary.  So it
should be close to O(n).

> The other (not possible in Perl) is to keep only the hash value of
> each previously seen line, and discard the actual line to conserve
> memory.  With a good hash function, the likelihood of finding two
> distinct input lines that hash to the same value should be pretty low
> (though non-zero).

This does not improve the time characteristics of the algorithm.  It
saves memory, though.

\Anton.
-- 
May the tuna salad be with you.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010627205835.A72506>