Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 16 Sep 2007 09:10:06 GMT
From:      Andrey Chernov <ache@nagual.pp.ru>
To:        freebsd-bugs@FreeBSD.org
Subject:   Re: gnu/116363: isspace broken for UTF-8 locales
Message-ID:  <200709160910.l8G9A6ts050905@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR gnu/116363; it has been noted by GNATS.

From: Andrey Chernov <ache@nagual.pp.ru>
To: Petr Hroudny <petr.hroudny@gmail.com>
Cc: freebsd-gnats-submit@FreeBSD.ORG, jkoshy@FreeBSD.ORG, perky@FreeBSD.ORG,
        i18n@FreeBSD.ORG
Subject: Re: gnu/116363: isspace broken for UTF-8 locales
Date: Sun, 16 Sep 2007 12:54:33 +0400

 On Sat, Sep 15, 2007 at 09:08:01AM +0000, Petr Hroudny wrote:
 > 
 > >Number:         116363
 > >Category:       gnu
 > >Synopsis:       isspace broken for UTF-8 locales
 > >Confidential:   no
 > >Severity:       non-critical
 > >Priority:       medium
 > >Responsible:    freebsd-bugs
 > >State:          open
 > >Quarter:        
 > >Keywords:       
 > >Date-Required:
 > >Class:          sw-bug
 > >Submitter-Id:   current-users
 > >Arrival-Date:   Sat Sep 15 09:10:02 GMT 2007
 > >Closed-Date:
 > >Last-Modified:
 > >Originator:     Petr Hroudny
 > >Release:        6-stable, 7-current
 > >Organization:
 > >Environment:
 > >Description:
 > In UTF-8 locales, isspace(0xA0) returns 1 which is wrong.
 > 
 > In UTF-8, 0xA0 could only be the second or third byte of multibyte character, but never a space.
 > 
 > As a consequence, operations like str.upper() and/or str.split() are broken, when
 > UTF-8 character with 0xA0 byte is encountered.
 
 It seems that our UTF-8.src is completely wrong, it is just plain Unicode 
 and not UTF-8 which multibyte values should start from
 C2-DF
 E0-EF
 F0-F4
 only (as stated in http://en.wikipedia.org/wiki/UTF-8 f.e.)
 Can anybody write replacement for it?
 
 -- 
 http://ache.pp.ru/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200709160910.l8G9A6ts050905>