Date: Sun, 16 Sep 2007 20:34:07 +0400 From: Andrey Chernov <ache@nagual.pp.ru> To: Hye-Shik Chang <perky@FreeBSD.org> Cc: jkoshy@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org, Petr Hroudny <petr.hroudny@gmail.com>, i18n@FreeBSD.org Subject: Re: gnu/116363: isspace broken for UTF-8 locales Message-ID: <20070916163407.GA10297@nagual.pp.ru> In-Reply-To: <20070916162214.GA49139@FreeBSD.org> References: <200709150908.l8F981jj075109@www.freebsd.org> <20070916085432.GA8884@nagual.pp.ru> <20070916162214.GA49139@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Sep 17, 2007 at 01:22:14AM +0900, Hye-Shik Chang wrote: > In fact, UTF-8.src defines values for not UTF-8 but Unicode codepoints. > Using the Unicode codepoint as wchar_t's internal representation gives > much benefit. I think we would be better to make isspace() and > other ctypes functions aware of "encoding". IIRC, tjr@ provided the > workaround as in the URL mentioned above and said that it would get > a chance to be fixed in 6 or 7 on 2004. Currently wchar_t represents given encoding in all places including wc<->mbr conversions. To make it UCS-4-only instead we need to rewrite the whole locale system from scratch and I see no benefits from that way. There is no simple workaround exists. In any case there is no excuse to make really-UCS-4.src to mimic UTF-8.src. Providing proper UTF-8.src is much less painful way than whole locale rewritting and I almost half way on converting UCS-4 source to it. -- http://ache.pp.ru/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070916163407.GA10297>