Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 08 Oct 2018 12:34:13 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 225692] iswprint() wrong for some FULL WIDTH characters in UTF-8 locale
Message-ID:  <bug-225692-227-oyhiUWVNKw@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-225692-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-225692-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225692

--- Comment #15 from Yuri Pankov <yuripv@yuripv.net> ---
Properly fixing U+FF08 and other full width characters is more involved.

A bit of background: we have common ctype definitions file for all UTF-8
locales, src/share/ctypedef/en_US.UTF-8.src (all other locale's symlink to the
resulting /usr/share/locale/en_US.UTF-8/LC_CTYPE).

src/share/ctypedef/en_US.UTF-8.src is in turn assembled from
src/tools/tools/locale/etc/common.UTF-8.src and
src/tools/tools/locale/etc/manual-input.UTF-8.

src/tools/tools/locale/etc/common.UTF-8.src is built using
src/tools/tools/locale/tools/utf8-rollup.pl, which contains the character
ranges belonging to different locales, and (among other things it does) checks
the corresponding .UTF-8.src file's LC_CTYPE section for character class to be
used.  If the character is not defined, then it doesn't get into the
common.UTF-8.src, which is exactly the case here as it's not found in
ja_JP.UTF-8.src and ko_KR.UTF-8.src.

TL;DR: all such characters not defined anywhere in *.UTF-8.src from CLDR need
to be added to the manual-input.UTF-8.

CLDR v34 is really close to be released, but I strongly doubt we will have the
full width characters we are missing defined in the new *.UTF-8.src files.  To
amend the issue for the release, I propose looking up these characters in
UTF-8.src we had in src/share/mklocale in pre-11.x times, and adding them to
manual-input.UTF-8.

The longer term solution would be asking CLDR guys about a way to build a
complete ctype map while building POSIX locale data files, and if that's not
possible, going over the entire UTF-8.src contents, and adding missing bits to
manual-input.UTF-8.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-225692-227-oyhiUWVNKw>