Date: Tue, 9 Mar 2010 22:33:40 +0300 From: Andrey Chernov <ache@nagual.pp.ru> To: Bruce Evans <brde@optusnet.com.au>, Jaakko Heinonen <jh@FreeBSD.ORG>, src-committers@FreeBSD.ORG, svn-src-all@FreeBSD.ORG, svn-src-head@FreeBSD.ORG Subject: Re: svn commit: r204803 - head/usr.bin/uniq Message-ID: <20100309193339.GA14612@nagual.pp.ru> In-Reply-To: <20100309175544.GA17698@zim.MIT.EDU> References: <201003061921.o26JLv36014114@svn.freebsd.org> <20100307104626.GA9015@a91-153-117-195.elisa-laajakaista.fi> <20100308015926.O11669@delplex.bde.org> <20100307183139.GA50243@nagual.pp.ru> <20100307201027.GA51623@nagual.pp.ru> <20100308195123.GA10624@zim.MIT.EDU> <20100308202919.GA67990@nagual.pp.ru> <20100309175544.GA17698@zim.MIT.EDU>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 09, 2010 at 12:55:44PM -0500, David Schultz wrote: > Actually, a question...why doesn't it suffice to simply call > strcoll() instead of mbstowcs() followed by wcscoll()? > I would expect that in the absence of the -i flag, none of > this would be necessary. strcoll() is only for single-byte characters locale. It means no UTF-8 f.e. To do what you assume (without coverting to wide chars), we'll need fast mbscoll() function (see our join.c for its slow emulation using wide chars). > At the very least, it would make > sense to start with a strcmp(), and only fall back on the > expensive conversion and collation if the strings don't > compare equal. As I notice, files feeded to uniq commonly have only few equal lines and much more unequal ones, so strcmp() will be additional overkill most of the time. -- http://ache.pp.ru/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100309193339.GA14612>