From owner-freebsd-bugs@FreeBSD.ORG Sun Sep 16 17:01:47 2007 Return-Path: Delivered-To: freebsd-bugs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D9EB16A418; Sun, 16 Sep 2007 17:01:47 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id DCDA413C45A; Sun, 16 Sep 2007 17:01:46 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8GH1jT0011283; Sun, 16 Sep 2007 21:01:45 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1189962105; bh=7J7E7YvcPyoBXdh2xk/E/UkqyAG2cbBbl+Ve8w+ kj7Y=; l=1472; h=Date:From:To:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=CcRvto4h7Jutf9cVzf8hubqDY8F7zODXGsNJ+6Mg FEcqRD7yBB1nTE5DIFFx24aosCNoJbHMNcJjSywu46KmnCqwrgXAL2jURQAnTUmOY6W MDOazCleaeLITeL9XZWbKXCYf4X6Z7yKqeb/SCTNeZBcF8Tbf5XxJggSv2wiTxTs= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8GH1ind011282; Sun, 16 Sep 2007 21:01:44 +0400 (MSD) (envelope-from ache) Date: Sun, 16 Sep 2007 21:01:43 +0400 From: Andrey Chernov To: freebsd-bugs@FreeBSD.ORG, jkoshy@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG, petr.hroudny@gmail.com Message-ID: <20070916170142.GA11047@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , freebsd-bugs@FreeBSD.ORG, jkoshy@freebsd.org, perky@FreeBSD.org, i18n@freebsd.org, petr.hroudny@gmail.com References: <200709161640.l8GGe7iQ077745@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200709161640.l8GGe7iQ077745@freefall.freebsd.org> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: Subject: Re: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 17:01:47 -0000 On Sun, Sep 16, 2007 at 04:40:07PM +0000, Andrey Chernov wrote: > The following reply was made to PR gnu/116363; it has been noted by GNATS. > > From: Andrey Chernov > To: Hye-Shik Chang > Cc: Petr Hroudny , freebsd-gnats-submit@FreeBSD.org, > jkoshy@FreeBSD.org, i18n@FreeBSD.org > Subject: Re: gnu/116363: isspace broken for UTF-8 locales > Date: Sun, 16 Sep 2007 20:34:07 +0400 > > On Mon, Sep 17, 2007 at 01:22:14AM +0900, Hye-Shik Chang wrote: > > In fact, UTF-8.src defines values for not UTF-8 but Unicode codepoints. > > Using the Unicode codepoint as wchar_t's internal representation gives > > much benefit. I think we would be better to make isspace() and > > other ctypes functions aware of "encoding". IIRC, tjr@ provided the > > workaround as in the URL mentioned above and said that it would get > > a chance to be fixed in 6 or 7 on 2004. > > Currently wchar_t represents given encoding in all places including > wc<->mbr conversions. To make it UCS-4-only instead we need to rewrite the Oops, sorry for my overlook, we really have UCS-4 as wchar_t, no UTF-8.src replacement is needed. In that case iswspace(0xA0) should be 1 but not isspace(0xA0) so it seems it is isspace() (and others plain ctype) bug. It seems even isspace(' ') is illegal in UTF-8 locale because all chars are wide, but I am not sure. -- http://ache.pp.ru/