Date: Sat, 15 Sep 2007 09:08:01 GMT From: Petr Hroudny <petr.hroudny@gmail.com> To: freebsd-gnats-submit@FreeBSD.org Subject: gnu/116363: isspace broken for UTF-8 locales Message-ID: <200709150908.l8F981jj075109@www.freebsd.org> Resent-Message-ID: <200709150910.l8F9A2b4063466@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 116363 >Category: gnu >Synopsis: isspace broken for UTF-8 locales >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Sep 15 09:10:02 GMT 2007 >Closed-Date: >Last-Modified: >Originator: Petr Hroudny >Release: 6-stable, 7-current >Organization: >Environment: >Description: In UTF-8 locales, isspace(0xA0) returns 1 which is wrong. In UTF-8, 0xA0 could only be the second or third byte of multibyte character, but never a space. As a consequence, operations like str.upper() and/or str.split() are broken, when UTF-8 character with 0xA0 byte is encountered. An example of such character is Scaron (UTF-8 code 0xC5 0xA0). >How-To-Repeat: >Fix: For UTF-8 locales, 0xA0 should never be considered to be a space. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200709150908.l8F981jj075109>