From owner-freebsd-bugs@freebsd.org Mon Feb 5 18:34:04 2018 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3FD94EE4D78 for ; Mon, 5 Feb 2018 18:34:04 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D18CE81F44 for ; Mon, 5 Feb 2018 18:34:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 1074141F0 for ; Mon, 5 Feb 2018 18:34:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w15IY29J090071 for ; Mon, 5 Feb 2018 18:34:02 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w15IY23l090070 for freebsd-bugs@FreeBSD.org; Mon, 5 Feb 2018 18:34:02 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 225692] iswprint() wrong for some FULL WIDTH characters in UTF-8 locale Date: Mon, 05 Feb 2018 18:34:02 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 11.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: jkerian+freebsdbugs@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Feb 2018 18:34:04 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D225692 Bug ID: 225692 Summary: iswprint() wrong for some FULL WIDTH characters in UTF-8 locale Product: Base System Version: 11.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: bin Assignee: freebsd-bugs@FreeBSD.org Reporter: jkerian+freebsdbugs@gmail.com Created attachment 190345 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D190345&action= =3Dedit Simple iswprint test When I run ls -B on one of my files, the UTF-8 pattern 0xef 0xbc 0x88 appe= ars to be replaced as unprintable. According to http://www.utf8-chartable.de/unicode-utf8-table.pl?start=3D65280&utf8=3D0x,= this should be U+FF08 a fullwidth left parenthesis. According to http://demo.icu-project.org/icu-bin/ubrowse?ch=3DFF08, U+FF08 = should be a perfectly printable character in a UTF-8 locale. Looking at the ls.c source code eventually led me to iswprint(). I wrote the simple program to test print the character enums and then print iswprint() results in a few locales on a series of characters. (Attaching in case of link rot, code & linux results can be seen: https://wandbox.org/permlink/ZDc36tQhh7BLRpBx) Linux and OSX have some odd behavior around the classes, but U+2002 and U+F= F08 are both perfectly printable on both systems in the UTF-8 locales. On the other hand FreeBSD is only returning 1 for iswprint(0x64), while it should = be showing U+2002 and U+FF08 as printable. On my box, running FreeBSD 11.1-RELEASE-p4 GENERIC amd64, I get the followi= ng results: [dev ~/test/iswprint]$ ./a.out alnum:0x400100, cntrl:0x200, ideogram:0x80000, print:0x40000, space:0x4000, xdigit:0x10000, alpha:0x100, digit:0x400, lower:0x1000, punct:0x2000, special:0x100000, blank:0x20000, graph:0x800, phonogram:0x200000, rune:0xffffff00, upper:0x8000, Default Locale is: C Character 0x64 is in classes: alnum print xdigit alpha lower graph rune in C locale, iswprint(0x64) =3D 1 in en_US.UTF-8 locale, iswprint(0x64) =3D 1 in ja_JP.UTF-8 locale, iswprint(0x64) =3D 1 Character 0x2002 is in classes: space rune in C locale, iswprint(0x2002) =3D 0 in en_US.UTF-8 locale, iswprint(0x2002) =3D 0 in ja_JP.UTF-8 locale, iswprint(0x2002) =3D 0 Character 0xff08 is in classes: rune in C locale, iswprint(0xff08) =3D 0 in en_US.UTF-8 locale, iswprint(0xff08) =3D 0 in ja_JP.UTF-8 locale, iswprint(0xff08) =3D 0 Character 0x2002 is in classes: space rune in C locale, iswprint(0x2002) =3D 0 in en_US.UTF-8 locale, iswprint(0x2002) =3D 0 in ja_JP.UTF-8 locale, iswprint(0x2002) =3D 0 Character 0x82 is in classes: cntrl rune in C locale, iswprint(0x82) =3D 0 in en_US.UTF-8 locale, iswprint(0x82) =3D 0 in ja_JP.UTF-8 locale, iswprint(0x82) =3D 0 I confirmed with a few other FreeBSD users that they get the same results. --=20 You are receiving this mail because: You are the assignee for the bug.=