From owner-freebsd-bugs@FreeBSD.ORG Sat Sep 15 09:10:02 2007 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A308816A41B for ; Sat, 15 Sep 2007 09:10:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6C43913C45E for ; Sat, 15 Sep 2007 09:10:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l8F9A2L9063467 for ; Sat, 15 Sep 2007 09:10:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l8F9A2b4063466; Sat, 15 Sep 2007 09:10:02 GMT (envelope-from gnats) Resent-Date: Sat, 15 Sep 2007 09:10:02 GMT Resent-Message-Id: <200709150910.l8F9A2b4063466@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Petr Hroudny Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 85EB916A418 for ; Sat, 15 Sep 2007 09:08:01 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 5E5E613C458 for ; Sat, 15 Sep 2007 09:08:01 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.1/8.14.1) with ESMTP id l8F981TM075110 for ; Sat, 15 Sep 2007 09:08:01 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.1/8.14.1/Submit) id l8F981jj075109; Sat, 15 Sep 2007 09:08:01 GMT (envelope-from nobody) Message-Id: <200709150908.l8F981jj075109@www.freebsd.org> Date: Sat, 15 Sep 2007 09:08:01 GMT From: Petr Hroudny To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Sep 2007 09:10:02 -0000 >Number: 116363 >Category: gnu >Synopsis: isspace broken for UTF-8 locales >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Sep 15 09:10:02 GMT 2007 >Closed-Date: >Last-Modified: >Originator: Petr Hroudny >Release: 6-stable, 7-current >Organization: >Environment: >Description: In UTF-8 locales, isspace(0xA0) returns 1 which is wrong. In UTF-8, 0xA0 could only be the second or third byte of multibyte character, but never a space. As a consequence, operations like str.upper() and/or str.split() are broken, when UTF-8 character with 0xA0 byte is encountered. An example of such character is Scaron (UTF-8 code 0xC5 0xA0). >How-To-Repeat: >Fix: For UTF-8 locales, 0xA0 should never be considered to be a space. >Release-Note: >Audit-Trail: >Unformatted: