From owner-freebsd-i18n@FreeBSD.ORG Sat May 20 17:25:20 2006 Return-Path: X-Original-To: freebsd-i18n@FreeBSD.org Delivered-To: freebsd-i18n@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6D44916A459; Sat, 20 May 2006 17:25:20 +0000 (UTC) (envelope-from llwang@infor.ck.tp.edu.tw) Received: from infor.ck.tp.edu.tw (infor.ck.tp.edu.tw [203.64.26.200]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1A49143D46; Sat, 20 May 2006 17:25:20 +0000 (GMT) (envelope-from llwang@infor.ck.tp.edu.tw) Received: by infor.ck.tp.edu.tw (Postfix, from userid 1001) id EDEAC1702F; Sun, 21 May 2006 01:25:16 +0800 (CST) Date: Sun, 21 May 2006 01:25:16 +0800 From: "Li-Lun Wang (Leland Wang)" To: freebsd-i18n@FreeBSD.org, freebsd-hackers@FreeBSD.org, tjr@FreeBSD.org Message-ID: <20060520172516.GA54779@Athena.infor.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; x-action=pgp-signed Content-Disposition: inline User-Agent: Mutt/1.5.11 Cc: Subject: Inconsistency in LC_CTYPE source files X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 May 2006 17:25:22 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, It came to my attention that some LC_CTYPE source files for UTF-8, UTF-8.src and zh_TW.UTF-8.src, are inconsistent with all other LC_CTYPE source files. The literals in all other LC_CTYPE source files, including am_ET.UTF-8.src, are written in the native byte sequence of that specific locale, whereas UTF-8.src and zh_TW.UTF-8.src are written in Unicode (It must be noted that UTF-8 is NOT the same as Unicode.). This creates headaches for locale-aware applications supporting UTF-8. For example, the usages and behaviors of the is*() and isw*() functions, like iswspace(), are different under all other locales including am_ET.UTF-8 and under other UTF-8 locales. Under all other locales including am_ET.UTF-8, the argument for the isw*() functions is the wide character literal in that locale, whereas under other UTF-8 locales the application must first convert the wide character from UTF-8 to Unicode before feeding into the isw*() functions. Is there any good reason to have such inconsistency? Shall we change UTF-8.src and zh_TW.UTF-8.src so that the behaviors are consistent with other locales? Sincerely, Li-Lun Wang -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEb1D7CQM7t5B2mhARAgMEAJ9FMpNx1IaUGIn0NNBaaHLj3DFQqACbBSJg tWnXCT2N15U+SntjmuTrGjI= =JNXG -----END PGP SIGNATURE-----