Date: Tue, 30 Jan 2018 18:49:43 -0600 From: Joseph Kerian <jkerian@gmail.com> To: freebsd-questions@freebsd.org Subject: FreeBSD iswprint() returns false for U+FF08 in UTF-8 locale Message-ID: <CALM2SrHKgXPr80FETekfWDuNLM3%2BW4jS6ppSeimi=iuKmBnG7Q@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
I recently searched one of my drives for files containing "unprintable characters" due to some issues I was seeing with file-listing programs. When I run ls -B on one of the files, the UTF-8 pattern 0xef 0xbc 0x88 appears to be the culprit. According to http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280&utf8=0x, this should be U+FF08 a fullwidth left parenthesis. This makes some sense, given the file. According to http://demo.icu-project.org/icu-bin/ubrowse?ch=FF08, U+FF08 should be a perfectly printable character in a UTF-8 locale. Looking at the ls.c source code eventually led me to iswprint(). I wrote the simple program to test print the character enums and then print iswprint() results in a few locales on a series of characters. https://wandbox.org/permlink/ZDc36tQhh7BLRpBx Linux and OSX have some odd behavior around the classes, but U+2002 and U+FF08 are both perfectly printable on both systems. On the other hand FreeBSD is only returning 1 for iswprint(0x64) Results from my box here here: https://gist.github.com/anonymous/0f21e139ae10c8c7996e7c056d686a7b, the results on that wandbox link are pretty typical for Linux systems. (My box is running: FreeBSD 11.1-RELEASE-p4 GENERIC amd64) Is this a bug? Am I missing a pkg/port to properly support UTF-8? -- -- Joe Kerian Email: jkerian@gmail.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CALM2SrHKgXPr80FETekfWDuNLM3%2BW4jS6ppSeimi=iuKmBnG7Q>