Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 May 2007 20:40:28 +0200
From:      Wolfgang Zenker <wolfgang@lyxys.ka.sub.org>
To:        Andrey Chernov <ache@freebsd.org>
Cc:        freebsd-i18n@freebsd.org
Subject:   Re: Why no non-latin TODIGIT mappings in UTF-8.src ?
Message-ID:  <20070528184028.GA19098@lyxys.ka.sub.org>
In-Reply-To: <20070528181829.GA18332@lyxys.ka.sub.org>
References:  <200705272241.l4RMfg07051300@juno.lyxys.ka.sub.org> <20070528072847.GA18850@nagual.pp.ru> <20070528084659.GA77240@lyxys.ka.sub.org> <20070528115250.GA24812@nagual.pp.ru> <20070528123456.GA12679@lyxys.ka.sub.org> <20070528124944.GA26009@nagual.pp.ru> <20070528181829.GA18332@lyxys.ka.sub.org>

next in thread | previous in thread | raw e-mail | index | archive | help
* Wolfgang Zenker <wolfgang@lyxys.ka.sub.org> [070528 20:18]:
> * Andrey Chernov <ache@freebsd.org> [070528 14:49]:
>> On Mon, May 28, 2007 at 02:34:56PM +0200, Wolfgang Zenker wrote:

>>> What would be a good place to read
>>> up about how much can be localised with locales and how much of it we
>>> currently (and maybe in the near future) support?

>> The Open Group Base Specs Issue 6
>> http://www.opengroup.org/onlinepubs/009695399/toc.htm

> So, as 7.3.1 says, in the "POSIX locale", which appears to be otherwise
> known as the "C" locale, only '0' to '9' can be defined as being in class
> digit. Because we use UTF-8.src as source for the "C" locale, we can not
> add definitions for digits in other scripts, right?

> In "a locale", which appears to be the generic case now, we are only
> allowed to define the digits <zero> to <nine> in the digit class. The
> digits '0' to '9' from the "portable character set" (= ASCII?) would be
> automatically included in the class.

> So if we have a locale using a non-latin script that happens to have its
> own "digit" characters, we can not use the UTF-8.src for the LC_CTYPE
> definitions but would best work with a copy and add DIGIT mappings for
> the digit characters in the script used? Or are <zero> to <nine>
> again fixed to be the ASCII codes '0' to '9'?

Found the answer in chapter 6. So, <zero> to <nine> are defined as the
respective digits in the portable character set. This leaves no possibility
to define digits for other scripts, AFAICS.

So, can anyone clue me in why this has been handled this way? It appears
to me that the possibilities of localization are quite limited as soon
as languages in non-latin scripts come into play. Are these problems
usually handled in individual applications then?

Wolfgang



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070528184028.GA19098>