FreeBSD Mail Archives

Date:      Wed, 18 Jun 2008 12:37:39 +0400
From:      Andrey Chernov <ache@nagual.pp.ru>
To:        Dag-Erling Sm??rgrav <des@des.no>
Cc:        Doug Barton <dougb@FreeBSD.org>, current@FreeBSD.org, Konrad Jankowski <konrad.jankowski@bluemedia.pl>, Diomidis Spinellis <dds@aueb.gr>, hackers@FreeBSD.org, Gabor Kovesdan <gabor@FreeBSD.org>, Max Khon <fjoe@samodelkin.net>, "Sean C. Farley" <scf@FreeBSD.org>, K?vesd?n G?bor <gabor@t-hosting.hu>
Subject:   Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Message-ID:  <20080618083739.GA87100@nagual.pp.ru>
In-Reply-To: <86zlpjduew.fsf@ds4.des.no>
References:  <20080617002224.GA16122@nagual.pp.ru> <20080617002808.GB16122@nagual.pp.ru> <20080617004647.GA16546@nagual.pp.ru> <48576610.9080808@FreeBSD.org> <48577510.4020007@aueb.gr> <48577BD2.4070205@bluemedia.pl> <20080617102900.GA46479@nagual.pp.ru> <485798C4.2050605@FreeBSD.org> <20080618055851.GA85018@nagual.pp.ru> <86zlpjduew.fsf@ds4.des.no>

On Wed, Jun 18, 2008 at 10:22:31AM +0200, Dag-Erling Sm??rgrav wrote:
> I think part of the problem is that there aren't enough people who truly
> understand localization.  I think I understand most of it, but I'm
> pretty sure I *don't* understand how collation works, or is supposed to
> work.  Amongst other things, I don't understand how (or whether) it
> handles cases like "aa" and "??", which are considered the same letter in
> Norwegian.

Single byte locales collation works through strcoll() via chains, i.e. 
seek all chains starting with given letter. Multibyte locales collation 
currently is not implemented and can't be properly implemented under 
existen single byte framework (it will consume resourses badly in that 
case). I know semi-hacking attempts to implement multibyte collattion via 
single byte one, but all they are only for small ASCII + national alphabet 
subset, rest of Unicode left unsorted.

> Perhaps you could create a Localization page on wiki.freebsd.org which
> addresses these issues, or at least points to relevant resources?

IMHO single byte collating will be obsolete soon when Unicode collation 
will be implemented as SoC project, we needs something like ICU library 
which performs as described below, i.e. unified sorting for all possible 
chars:
http://unicode.org/reports/tr10/

-- 
http://ache.pp.ru/

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080618083739.GA87100>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation