Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Jan 2018 18:49:43 -0600
From:      Joseph Kerian <jkerian@gmail.com>
To:        freebsd-questions@freebsd.org
Subject:   FreeBSD iswprint() returns false for U+FF08 in UTF-8 locale
Message-ID:  <CALM2SrHKgXPr80FETekfWDuNLM3%2BW4jS6ppSeimi=iuKmBnG7Q@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
I recently searched one of my drives for files containing "unprintable
characters" due to some issues I was seeing with file-listing programs.

When I run ls -B on one of the files, the UTF-8 pattern  0xef 0xbc 0x88 appears
to be the culprit. According to
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280&utf8=0x,
this should be U+FF08 a fullwidth left parenthesis. This makes some sense,
given the file.

According to http://demo.icu-project.org/icu-bin/ubrowse?ch=FF08, U+FF08
should be a perfectly printable character in a UTF-8 locale. Looking at the
ls.c source code eventually led me to iswprint().

I wrote the simple program to test print the character enums and then print
iswprint() results in a few locales on a series of characters.
https://wandbox.org/permlink/ZDc36tQhh7BLRpBx

Linux and OSX have some odd behavior around the classes, but U+2002 and
U+FF08 are both perfectly printable on both systems.  On the other hand
FreeBSD is only returning 1 for iswprint(0x64)  Results from my box here
here: https://gist.github.com/anonymous/0f21e139ae10c8c7996e7c056d686a7b,
the results on that wandbox link are pretty typical for Linux systems.

(My box is running: FreeBSD 11.1-RELEASE-p4 GENERIC amd64)

Is this a bug?  Am I missing a pkg/port to properly support UTF-8?

-- 
--
Joe Kerian
Email: jkerian@gmail.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CALM2SrHKgXPr80FETekfWDuNLM3%2BW4jS6ppSeimi=iuKmBnG7Q>