Date: Mon, 05 Feb 2018 20:21:08 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 225692] iswprint() wrong for some FULL WIDTH characters in UTF-8 locale Message-ID: <bug-225692-8-ZpOKGMTwdK@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-225692-8@https.bugs.freebsd.org/bugzilla/> References: <bug-225692-8@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D225692 --- Comment #1 from Conrad Meyer <cem@freebsd.org> --- iswprint(wc) is a thin shim around __istype(wc, _CTYPE_R); __istype(wc, type) is a thin shim in include/_ctype.h: return (!!__maskrune(wc, _CTYPE_R)); __maskrune() is defined earlier in the same file: return ((wc < 0 || wc >=3D _CACHED_RUNES) ? ___runetype(wc) : _CurrentRuneLocale->__runetype[wc]) & _CTYPE_R; (CACHED_RUNES is probably 1<<8.) This tells me the type information is being looked up in ___runetype() and = that the _CTYPE_R bit must be unset for 0x2002/0xff08. At some level, I thought we got this metadata from the unicode standard tab= les, but maybe ours are out of date or this particular data is sourced independently. ___runetype(wc) is a thin shim around ___runetype_l(wc, __get_locale()); ___runetype_l() does a binary search in the _RuneRange table for the current locale object. If nothing is found, it returns 0. This suggests the curre= nt locale object does not have or does not have correct type metadata for at l= east these two characters. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-225692-8-ZpOKGMTwdK>