Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Mar 2010 22:33:40 +0300
From:      Andrey Chernov <ache@nagual.pp.ru>
To:        Bruce Evans <brde@optusnet.com.au>, Jaakko Heinonen <jh@FreeBSD.ORG>, src-committers@FreeBSD.ORG, svn-src-all@FreeBSD.ORG, svn-src-head@FreeBSD.ORG
Subject:   Re: svn commit: r204803 - head/usr.bin/uniq
Message-ID:  <20100309193339.GA14612@nagual.pp.ru>
In-Reply-To: <20100309175544.GA17698@zim.MIT.EDU>
References:  <201003061921.o26JLv36014114@svn.freebsd.org> <20100307104626.GA9015@a91-153-117-195.elisa-laajakaista.fi> <20100308015926.O11669@delplex.bde.org> <20100307183139.GA50243@nagual.pp.ru> <20100307201027.GA51623@nagual.pp.ru> <20100308195123.GA10624@zim.MIT.EDU> <20100308202919.GA67990@nagual.pp.ru> <20100309175544.GA17698@zim.MIT.EDU>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 09, 2010 at 12:55:44PM -0500, David Schultz wrote:
> Actually, a question...why doesn't it suffice to simply call
> strcoll() instead of mbstowcs() followed by wcscoll()?
> I would expect that in the absence of the -i flag, none of
> this would be necessary.  

strcoll() is only for single-byte characters locale. It means no UTF-8 
f.e. To do what you assume (without coverting to wide chars), we'll need 
fast mbscoll() function (see our join.c for its slow emulation using 
wide chars).

> At the very least, it would make
> sense to start with a strcmp(), and only fall back on the
> expensive conversion and collation if the strings don't
> compare equal.

As I notice, files feeded to uniq commonly have only few equal lines and 
much more unequal ones, so strcmp() will be additional overkill most of 
the time.

-- 
http://ache.pp.ru/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100309193339.GA14612>