Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 May 2007 10:46:59 +0200
From:      Wolfgang Zenker <wolfgang@lyxys.ka.sub.org>
To:        Andrey Chernov <ache@freebsd.org>, freebsd-i18n@freebsd.org
Subject:   Re: Why no non-latin TODIGIT mappings in UTF-8.src ?
Message-ID:  <20070528084659.GA77240@lyxys.ka.sub.org>
In-Reply-To: <20070528072847.GA18850@nagual.pp.ru>
References:  <200705272241.l4RMfg07051300@juno.lyxys.ka.sub.org> <20070528072847.GA18850@nagual.pp.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
* Andrey Chernov <ache@freebsd.org> [070528 09:28]:
> On Mon, May 28, 2007 at 12:41:42AM +0200, Wolfgang Zenker wrote:

>> I'm a bit surprised there are no TODIGIT mappings for non-latin scripts
>> in src/share/mklocale/UTF-8. Is there a technical reason why this would
>> be a bad idea or is it simply because noone did get around to define the
>> mappings yet?

> Because of POSIX isdigit():

> digit 
> Define the characters to be classified as numeric digits. 
> In the POSIX locale, only:

> 0 1 2 3 4 5 6 7 8 9


> shall be included.

> In a locale definition file, only the digits <zero>, <one>, <two>, 
> <three>, <four>, <five>, <six>, <seven>, <eight>, and <nine> shall be 
> specified, and in contiguous ascending sequence by numerical value. The 
> digits <zero> to <nine> of the portable character set are automatically 
> included in this class.

Looking at our UTF-8.src, I see

$ grep DIGIT UTF-8.src
DIGIT     '0' - '9'
XDIGIT    '0' - '9'  'A' - 'F'  'a' - 'f'
TODIGIT   < '0' - '9' : 0x0000 >
TODIGIT   < 'A' - 'F' : 10 > < 'a' - 'f' : 10 >

It appears to me that isdigit() behaviour is controlled by the DIGIT
keyword, not TODIGIT. However, I do admit that I don't understand completely
how locale files are supposed to work. So where does e.g. iswdigit() get
its character class information from, should that not be in the locale
information as well somewhere?

Wolfgang



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070528084659.GA77240>