From owner-freebsd-i18n@FreeBSD.ORG Mon May 28 08:47:01 2007 Return-Path: X-Original-To: freebsd-i18n@freebsd.org Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8756616A500 for ; Mon, 28 May 2007 08:47:01 +0000 (UTC) (envelope-from wolfgang@lyxys.ka.sub.org) Received: from saturn.lyxys.ka.sub.org (saturn.lyxys.ka.sub.org [217.29.35.151]) by mx1.freebsd.org (Postfix) with ESMTP id E8B2A13C45B for ; Mon, 28 May 2007 08:47:00 +0000 (UTC) (envelope-from wolfgang@lyxys.ka.sub.org) Received: from juno.lyxys.ka.sub.org (juno.lyxys.ka.sub.org [IPv6:2001:5c0:8521:0:20f:feff:fe0e:7312]) by saturn.lyxys.ka.sub.org (8.14.1/8.14.1) with ESMTP id l4S8kx2I034614; Mon, 28 May 2007 10:46:59 +0200 (CEST) (envelope-from wolfgang@lyxys.ka.sub.org) Received: from juno.lyxys.ka.sub.org (localhost [127.0.0.1]) by juno.lyxys.ka.sub.org (8.14.1/8.14.1) with ESMTP id l4S8kx5Q077999; Mon, 28 May 2007 10:46:59 +0200 (CEST) (envelope-from wolfgang@lyxys.ka.sub.org) Received: (from wolfgang@localhost) by juno.lyxys.ka.sub.org (8.14.1/8.14.1/Submit) id l4S8kxYe077998; Mon, 28 May 2007 10:46:59 +0200 (CEST) (envelope-from wolfgang@lyxys.ka.sub.org) X-Authentication-Warning: juno.lyxys.ka.sub.org: wolfgang set sender to wolfgang@lyxys.ka.sub.org using -f Date: Mon, 28 May 2007 10:46:59 +0200 From: Wolfgang Zenker To: Andrey Chernov , freebsd-i18n@freebsd.org Message-ID: <20070528084659.GA77240@lyxys.ka.sub.org> References: <200705272241.l4RMfg07051300@juno.lyxys.ka.sub.org> <20070528072847.GA18850@nagual.pp.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070528072847.GA18850@nagual.pp.ru> User-Agent: Mutt/1.4.2.2i Organization: private site X-Greylist: Sender is SPF-compliant, not delayed by milter-greylist-3.0 (saturn.lyxys.ka.sub.org [IPv6:2001:5c0:8521:1:240:63ff:fed8:ce97]); Mon, 28 May 2007 10:46:59 +0200 (CEST) Cc: Subject: Re: Why no non-latin TODIGIT mappings in UTF-8.src ? X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 May 2007 08:47:01 -0000 * Andrey Chernov [070528 09:28]: > On Mon, May 28, 2007 at 12:41:42AM +0200, Wolfgang Zenker wrote: >> I'm a bit surprised there are no TODIGIT mappings for non-latin scripts >> in src/share/mklocale/UTF-8. Is there a technical reason why this would >> be a bad idea or is it simply because noone did get around to define the >> mappings yet? > Because of POSIX isdigit(): > digit > Define the characters to be classified as numeric digits. > In the POSIX locale, only: > 0 1 2 3 4 5 6 7 8 9 > shall be included. > In a locale definition file, only the digits , , , > , , , , , , and shall be > specified, and in contiguous ascending sequence by numerical value. The > digits to of the portable character set are automatically > included in this class. Looking at our UTF-8.src, I see $ grep DIGIT UTF-8.src DIGIT '0' - '9' XDIGIT '0' - '9' 'A' - 'F' 'a' - 'f' TODIGIT < '0' - '9' : 0x0000 > TODIGIT < 'A' - 'F' : 10 > < 'a' - 'f' : 10 > It appears to me that isdigit() behaviour is controlled by the DIGIT keyword, not TODIGIT. However, I do admit that I don't understand completely how locale files are supposed to work. So where does e.g. iswdigit() get its character class information from, should that not be in the locale information as well somewhere? Wolfgang