From owner-freebsd-bugs@FreeBSD.ORG  Sun Sep 16 16:40:07 2007
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Delivered-To: freebsd-bugs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8503F16A417
	for <freebsd-bugs@hub.freebsd.org>;
	Sun, 16 Sep 2007 16:40:07 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 70EE413C45D
	for <freebsd-bugs@hub.freebsd.org>;
	Sun, 16 Sep 2007 16:40:07 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l8GGe7Zx077746
	for <freebsd-bugs@freefall.freebsd.org>; Sun, 16 Sep 2007 16:40:07 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l8GGe7iQ077745;
	Sun, 16 Sep 2007 16:40:07 GMT (envelope-from gnats)
Date: Sun, 16 Sep 2007 16:40:07 GMT
Message-Id: <200709161640.l8GGe7iQ077745@freefall.freebsd.org>
To: freebsd-bugs@FreeBSD.org
From: Andrey Chernov <ache@nagual.pp.ru>
Cc: 
Subject: Re: gnu/116363: isspace broken for UTF-8 locales
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Andrey Chernov <ache@nagual.pp.ru>
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Sep 2007 16:40:07 -0000

The following reply was made to PR gnu/116363; it has been noted by GNATS.

From: Andrey Chernov <ache@nagual.pp.ru>
To: Hye-Shik Chang <perky@FreeBSD.org>
Cc: Petr Hroudny <petr.hroudny@gmail.com>, freebsd-gnats-submit@FreeBSD.org,
        jkoshy@FreeBSD.org, i18n@FreeBSD.org
Subject: Re: gnu/116363: isspace broken for UTF-8 locales
Date: Sun, 16 Sep 2007 20:34:07 +0400

 On Mon, Sep 17, 2007 at 01:22:14AM +0900, Hye-Shik Chang wrote:
 > In fact, UTF-8.src defines values for not UTF-8 but Unicode codepoints.
 > Using the Unicode codepoint as wchar_t's internal representation gives
 > much benefit.  I think we would be better to make isspace() and
 > other ctypes functions aware of "encoding".  IIRC, tjr@ provided the
 > workaround as in the URL mentioned above and said that it would get
 > a chance to be fixed in 6 or 7 on 2004.
 
 Currently wchar_t represents given encoding in all places including 
 wc<->mbr conversions. To make it UCS-4-only instead we need to rewrite the 
 whole locale system from scratch and I see no benefits from that way. 
 There is no simple workaround exists.
 
 In any case there is no excuse to make really-UCS-4.src to mimic 
 UTF-8.src. Providing proper UTF-8.src is much less painful way than whole 
 locale rewritting and I almost half way on converting UCS-4 source to it.
 
 -- 
 http://ache.pp.ru/