From owner-freebsd-i18n@FreeBSD.ORG Sun Sep 16 16:34:10 2007 Return-Path: Delivered-To: i18n@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6905216A418; Sun, 16 Sep 2007 16:34:10 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id DA55C13C468; Sun, 16 Sep 2007 16:34:09 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8GGY8G1011013; Sun, 16 Sep 2007 20:34:08 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1189960448; bh=ImWgPRUJTJpnVZOQ49R9FOgDAUmOOe1wLavhTMK LCzI=; l=974; h=Date:From:To:Cc:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=RmJb/mRRTU8MJhxtOTWjBTswk11e5F+QdoeQxOz5 R2yG7js3pCBww8lNJSiODq9h3bbsz2GDpcCefd79G59YvuHilEGCCyb4aX+Mf9k+7Uq cl4OUZMnf0S8+0yyhNjexInI4WWeXTW3F9s8pujrU1ZJg/y+IdokZbTIubcVMDJo= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8GGY8Mf011012; Sun, 16 Sep 2007 20:34:08 +0400 (MSD) (envelope-from ache) Date: Sun, 16 Sep 2007 20:34:07 +0400 From: Andrey Chernov To: Hye-Shik Chang Message-ID: <20070916163407.GA10297@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Hye-Shik Chang , Petr Hroudny , freebsd-gnats-submit@FreeBSD.org, jkoshy@FreeBSD.org, i18n@FreeBSD.org References: <200709150908.l8F981jj075109@www.freebsd.org> <20070916085432.GA8884@nagual.pp.ru> <20070916162214.GA49139@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070916162214.GA49139@FreeBSD.org> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: jkoshy@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org, Petr Hroudny , i18n@FreeBSD.org Subject: Re: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 16:34:10 -0000 On Mon, Sep 17, 2007 at 01:22:14AM +0900, Hye-Shik Chang wrote: > In fact, UTF-8.src defines values for not UTF-8 but Unicode codepoints. > Using the Unicode codepoint as wchar_t's internal representation gives > much benefit. I think we would be better to make isspace() and > other ctypes functions aware of "encoding". IIRC, tjr@ provided the > workaround as in the URL mentioned above and said that it would get > a chance to be fixed in 6 or 7 on 2004. Currently wchar_t represents given encoding in all places including wc<->mbr conversions. To make it UCS-4-only instead we need to rewrite the whole locale system from scratch and I see no benefits from that way. There is no simple workaround exists. In any case there is no excuse to make really-UCS-4.src to mimic UTF-8.src. Providing proper UTF-8.src is much less painful way than whole locale rewritting and I almost half way on converting UCS-4 source to it. -- http://ache.pp.ru/