Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 05 Feb 2018 18:34:02 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 225692] iswprint() wrong for some FULL WIDTH characters in UTF-8 locale
Message-ID:  <bug-225692-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D225692

            Bug ID: 225692
           Summary: iswprint() wrong for some FULL WIDTH characters in
                    UTF-8 locale
           Product: Base System
           Version: 11.1-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: jkerian+freebsdbugs@gmail.com

Created attachment 190345
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D190345&action=
=3Dedit
Simple iswprint test

When I run ls -B on one of my files, the UTF-8 pattern  0xef 0xbc 0x88 appe=
ars
to be replaced as unprintable. According to
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=3D65280&utf8=3D0x,=
 this
should be U+FF08 a fullwidth left parenthesis.

According to http://demo.icu-project.org/icu-bin/ubrowse?ch=3DFF08, U+FF08 =
should
be a perfectly printable character in a UTF-8 locale. Looking at the ls.c
source code eventually led me to iswprint().

I wrote the simple program to test print the character enums and then print
iswprint() results in a few locales on a series of characters. (Attaching in
case of link rot, code & linux results can be seen:
https://wandbox.org/permlink/ZDc36tQhh7BLRpBx)

Linux and OSX have some odd behavior around the classes, but U+2002 and U+F=
F08
are both perfectly printable on both systems in the UTF-8 locales.  On the
other hand FreeBSD is only returning 1 for iswprint(0x64), while it should =
be
showing U+2002 and U+FF08 as printable.

On my box, running FreeBSD 11.1-RELEASE-p4 GENERIC amd64, I get the followi=
ng
results:

[dev ~/test/iswprint]$ ./a.out
alnum:0x400100, cntrl:0x200, ideogram:0x80000, print:0x40000, space:0x4000,
xdigit:0x10000, alpha:0x100, digit:0x400, lower:0x1000, punct:0x2000,
special:0x100000, blank:0x20000, graph:0x800, phonogram:0x200000,
rune:0xffffff00, upper:0x8000,
Default Locale is: C
Character 0x64 is in classes: alnum print xdigit alpha lower graph rune
in C locale, iswprint(0x64) =3D 1
in en_US.UTF-8 locale, iswprint(0x64) =3D 1
in ja_JP.UTF-8 locale, iswprint(0x64) =3D 1

Character 0x2002 is in classes: space rune
in C locale, iswprint(0x2002) =3D 0
in en_US.UTF-8 locale, iswprint(0x2002) =3D 0
in ja_JP.UTF-8 locale, iswprint(0x2002) =3D 0

Character 0xff08 is in classes: rune
in C locale, iswprint(0xff08) =3D 0
in en_US.UTF-8 locale, iswprint(0xff08) =3D 0
in ja_JP.UTF-8 locale, iswprint(0xff08) =3D 0

Character 0x2002 is in classes: space rune
in C locale, iswprint(0x2002) =3D 0
in en_US.UTF-8 locale, iswprint(0x2002) =3D 0
in ja_JP.UTF-8 locale, iswprint(0x2002) =3D 0

Character 0x82 is in classes: cntrl rune
in C locale, iswprint(0x82) =3D 0
in en_US.UTF-8 locale, iswprint(0x82) =3D 0
in ja_JP.UTF-8 locale, iswprint(0x82) =3D 0

I confirmed with a few other FreeBSD users that they get the same results.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-225692-8>