From owner-freebsd-i18n@FreeBSD.ORG Sun Sep 16 09:06:50 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E28716A417; Sun, 16 Sep 2007 09:06:50 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 90F7313C45E; Sun, 16 Sep 2007 09:06:49 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8G8sXqs008993; Sun, 16 Sep 2007 12:54:33 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1189932873; bh=4WOVofvy4VQD1wBrmU4WTi71arSVvHcJpnPT9UV zw+Q=; l=1230; h=Date:From:To:Cc:Subject:Message-ID: Mail-Followup-To:References:MIME-Version:Content-Type: Content-Disposition:In-Reply-To:User-Agent; b=skZ2M0d4Lc6uYrXZyndf ME67ViQhlgWDM9NjMOnn60+/v2tssW75lYftyIkMOjOW7VCjlfhN4MSw37dD2LEObic 4gAVLNKLY6qE6etHEM8wgH6IyFkCbwXUWgUvKTwJ3jo7yQ83OP37UsI6EHqJUZb3/MY Vx1+VKZ2vSnX1IM00= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8G8sX8b008992; Sun, 16 Sep 2007 12:54:33 +0400 (MSD) (envelope-from ache) Date: Sun, 16 Sep 2007 12:54:33 +0400 From: Andrey Chernov To: Petr Hroudny Message-ID: <20070916085432.GA8884@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Petr Hroudny , freebsd-gnats-submit@FreeBSD.ORG, jkoshy@freebsd.org, perky@FreeBSD.org, i18n@freebsd.org References: <200709150908.l8F981jj075109@www.freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200709150908.l8F981jj075109@www.freebsd.org> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: jkoshy@FreeBSD.ORG, freebsd-gnats-submit@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG Subject: Re: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 09:06:50 -0000 On Sat, Sep 15, 2007 at 09:08:01AM +0000, Petr Hroudny wrote: > > >Number: 116363 > >Category: gnu > >Synopsis: isspace broken for UTF-8 locales > >Confidential: no > >Severity: non-critical > >Priority: medium > >Responsible: freebsd-bugs > >State: open > >Quarter: > >Keywords: > >Date-Required: > >Class: sw-bug > >Submitter-Id: current-users > >Arrival-Date: Sat Sep 15 09:10:02 GMT 2007 > >Closed-Date: > >Last-Modified: > >Originator: Petr Hroudny > >Release: 6-stable, 7-current > >Organization: > >Environment: > >Description: > In UTF-8 locales, isspace(0xA0) returns 1 which is wrong. > > In UTF-8, 0xA0 could only be the second or third byte of multibyte character, but never a space. > > As a consequence, operations like str.upper() and/or str.split() are broken, when > UTF-8 character with 0xA0 byte is encountered. It seems that our UTF-8.src is completely wrong, it is just plain Unicode and not UTF-8 which multibyte values should start from C2-DF E0-EF F0-F4 only (as stated in http://en.wikipedia.org/wiki/UTF-8 f.e.) Can anybody write replacement for it? -- http://ache.pp.ru/