From owner-freebsd-i18n@FreeBSD.ORG  Sat May 20 17:25:20 2006
Return-Path: <owner-freebsd-i18n@FreeBSD.ORG>
X-Original-To: freebsd-i18n@FreeBSD.org
Delivered-To: freebsd-i18n@FreeBSD.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6D44916A459;
	Sat, 20 May 2006 17:25:20 +0000 (UTC)
	(envelope-from llwang@infor.ck.tp.edu.tw)
Received: from infor.ck.tp.edu.tw (infor.ck.tp.edu.tw [203.64.26.200])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1A49143D46;
	Sat, 20 May 2006 17:25:20 +0000 (GMT)
	(envelope-from llwang@infor.ck.tp.edu.tw)
Received: by infor.ck.tp.edu.tw (Postfix, from userid 1001)
	id EDEAC1702F; Sun, 21 May 2006 01:25:16 +0800 (CST)
Date: Sun, 21 May 2006 01:25:16 +0800
From: "Li-Lun Wang (Leland Wang)" <llwang@infor.org>
To: freebsd-i18n@FreeBSD.org, freebsd-hackers@FreeBSD.org, tjr@FreeBSD.org
Message-ID: <20060520172516.GA54779@Athena.infor.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; x-action=pgp-signed
Content-Disposition: inline
User-Agent: Mutt/1.5.11
Cc: 
Subject: Inconsistency in LC_CTYPE source files
X-BeenThere: freebsd-i18n@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD Internationalization Effort <freebsd-i18n.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i18n>,
	<mailto:freebsd-i18n-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-i18n>
List-Post: <mailto:freebsd-i18n@freebsd.org>
List-Help: <mailto:freebsd-i18n-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i18n>,
	<mailto:freebsd-i18n-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 May 2006 17:25:22 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

It came to my attention that some LC_CTYPE source files for UTF-8,
UTF-8.src and zh_TW.UTF-8.src, are inconsistent with all other
LC_CTYPE source files. The literals in all other LC_CTYPE source
files, including am_ET.UTF-8.src, are written in the native byte
sequence of that specific locale, whereas UTF-8.src and
zh_TW.UTF-8.src are written in Unicode (It must be noted that UTF-8
is NOT the same as Unicode.). This creates headaches for locale-aware
applications supporting UTF-8. For example, the usages and behaviors
of the is*() and isw*() functions, like iswspace(), are different
under all other locales including am_ET.UTF-8 and under other UTF-8
locales. Under all other locales including am_ET.UTF-8, the argument
for the isw*() functions is the wide character literal in that locale,
whereas under other UTF-8 locales the application must first convert
the wide character from UTF-8 to Unicode before feeding into the
isw*() functions. Is there any good reason to have such inconsistency?
Shall we change UTF-8.src and zh_TW.UTF-8.src so that the behaviors
are consistent with other locales?

Sincerely,
Li-Lun Wang
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)

iD8DBQFEb1D7CQM7t5B2mhARAgMEAJ9FMpNx1IaUGIn0NNBaaHLj3DFQqACbBSJg
tWnXCT2N15U+SntjmuTrGjI=
=JNXG
-----END PGP SIGNATURE-----