Date: Mon, 28 May 2007 10:46:59 +0200 From: Wolfgang Zenker <wolfgang@lyxys.ka.sub.org> To: Andrey Chernov <ache@freebsd.org>, freebsd-i18n@freebsd.org Subject: Re: Why no non-latin TODIGIT mappings in UTF-8.src ? Message-ID: <20070528084659.GA77240@lyxys.ka.sub.org> In-Reply-To: <20070528072847.GA18850@nagual.pp.ru> References: <200705272241.l4RMfg07051300@juno.lyxys.ka.sub.org> <20070528072847.GA18850@nagual.pp.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
* Andrey Chernov <ache@freebsd.org> [070528 09:28]: > On Mon, May 28, 2007 at 12:41:42AM +0200, Wolfgang Zenker wrote: >> I'm a bit surprised there are no TODIGIT mappings for non-latin scripts >> in src/share/mklocale/UTF-8. Is there a technical reason why this would >> be a bad idea or is it simply because noone did get around to define the >> mappings yet? > Because of POSIX isdigit(): > digit > Define the characters to be classified as numeric digits. > In the POSIX locale, only: > 0 1 2 3 4 5 6 7 8 9 > shall be included. > In a locale definition file, only the digits <zero>, <one>, <two>, > <three>, <four>, <five>, <six>, <seven>, <eight>, and <nine> shall be > specified, and in contiguous ascending sequence by numerical value. The > digits <zero> to <nine> of the portable character set are automatically > included in this class. Looking at our UTF-8.src, I see $ grep DIGIT UTF-8.src DIGIT '0' - '9' XDIGIT '0' - '9' 'A' - 'F' 'a' - 'f' TODIGIT < '0' - '9' : 0x0000 > TODIGIT < 'A' - 'F' : 10 > < 'a' - 'f' : 10 > It appears to me that isdigit() behaviour is controlled by the DIGIT keyword, not TODIGIT. However, I do admit that I don't understand completely how locale files are supposed to work. So where does e.g. iswdigit() get its character class information from, should that not be in the locale information as well somewhere? Wolfgang
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070528084659.GA77240>