Date: Mon, 05 Feb 2018 18:34:02 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 225692] iswprint() wrong for some FULL WIDTH characters in UTF-8 locale Message-ID: <bug-225692-8@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D225692 Bug ID: 225692 Summary: iswprint() wrong for some FULL WIDTH characters in UTF-8 locale Product: Base System Version: 11.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: bin Assignee: freebsd-bugs@FreeBSD.org Reporter: jkerian+freebsdbugs@gmail.com Created attachment 190345 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D190345&action= =3Dedit Simple iswprint test When I run ls -B on one of my files, the UTF-8 pattern 0xef 0xbc 0x88 appe= ars to be replaced as unprintable. According to http://www.utf8-chartable.de/unicode-utf8-table.pl?start=3D65280&utf8=3D0x,= this should be U+FF08 a fullwidth left parenthesis. According to http://demo.icu-project.org/icu-bin/ubrowse?ch=3DFF08, U+FF08 = should be a perfectly printable character in a UTF-8 locale. Looking at the ls.c source code eventually led me to iswprint(). I wrote the simple program to test print the character enums and then print iswprint() results in a few locales on a series of characters. (Attaching in case of link rot, code & linux results can be seen: https://wandbox.org/permlink/ZDc36tQhh7BLRpBx) Linux and OSX have some odd behavior around the classes, but U+2002 and U+F= F08 are both perfectly printable on both systems in the UTF-8 locales. On the other hand FreeBSD is only returning 1 for iswprint(0x64), while it should = be showing U+2002 and U+FF08 as printable. On my box, running FreeBSD 11.1-RELEASE-p4 GENERIC amd64, I get the followi= ng results: [dev ~/test/iswprint]$ ./a.out alnum:0x400100, cntrl:0x200, ideogram:0x80000, print:0x40000, space:0x4000, xdigit:0x10000, alpha:0x100, digit:0x400, lower:0x1000, punct:0x2000, special:0x100000, blank:0x20000, graph:0x800, phonogram:0x200000, rune:0xffffff00, upper:0x8000, Default Locale is: C Character 0x64 is in classes: alnum print xdigit alpha lower graph rune in C locale, iswprint(0x64) =3D 1 in en_US.UTF-8 locale, iswprint(0x64) =3D 1 in ja_JP.UTF-8 locale, iswprint(0x64) =3D 1 Character 0x2002 is in classes: space rune in C locale, iswprint(0x2002) =3D 0 in en_US.UTF-8 locale, iswprint(0x2002) =3D 0 in ja_JP.UTF-8 locale, iswprint(0x2002) =3D 0 Character 0xff08 is in classes: rune in C locale, iswprint(0xff08) =3D 0 in en_US.UTF-8 locale, iswprint(0xff08) =3D 0 in ja_JP.UTF-8 locale, iswprint(0xff08) =3D 0 Character 0x2002 is in classes: space rune in C locale, iswprint(0x2002) =3D 0 in en_US.UTF-8 locale, iswprint(0x2002) =3D 0 in ja_JP.UTF-8 locale, iswprint(0x2002) =3D 0 Character 0x82 is in classes: cntrl rune in C locale, iswprint(0x82) =3D 0 in en_US.UTF-8 locale, iswprint(0x82) =3D 0 in ja_JP.UTF-8 locale, iswprint(0x82) =3D 0 I confirmed with a few other FreeBSD users that they get the same results. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-225692-8>