Date: Thu, 17 Jan 2008 16:13:29 +0100 From: =?UTF-8?B?UmFmYcOrbCBDYXJyw6k=?= <funman@videolan.org> To: questions@freebsd.org Subject: Some UTF-8 characters are not representable on FreeBSD7 Message-ID: <20080117161329.69fe4135@zod.zod>
next in thread | raw e-mail | index | archive | help
--Sig_/E.JkcNTlRDC02h.7oneId8g Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello, I noticed I couldn't use some characters with libncursesw: namely =E2=9A=91= =E2=9A=90 and =E2=8F=8F. I run into some tests and found that some characters were reported as unprintable, while on Linux all was fine. I found it extremely strange since those characters would show up in my terminal (gnome-terminal) when I pasted them. Here are the results of the test I ran on Linux and FreeBSD: [fun@zod ~]% uname -a ;./test FreeBSD zod 7.0-BETA3 FreeBSD 7.0-BETA3 #0: Sun Dec 2 02:30:18 CET 2007 root@zod:/media/externe/usr/src/sys/ZOD i386 Locale: fr_FR.UTF-8 OK a : 1=20 OK =E2=9A=91 : 0 OK =C3=B6 : 1 OK =E2=86=91 : 1 OK =C2=A9 : 1 OK =E2=9A=90 : 0 OK =C3=A9 : 1 OK =E2=8F=8F : 0 [fun@zod ~]% uname -a ; LANG=3Dfr_FR.ISO8859-15 ./test FreeBSD zod 7.0-BETA3 FreeBSD 7.0-BETA3 #0: Sun Dec 2 02:30:18 CET 2007 root@zod:/media/externe/usr/src/sys/ZOD i386 Locale: fr_FR.ISO8859-15 OK a : 1 OK =E2=9A=91 : 1 OK =C3=B6 : 1 OK =E2=86=91 : 1 OK =C2=A9 : 1 OK =E2=9A=90 : 1 OK =C3=A9 : 1 OK =E2=8F=8F : 1 16:03 funman@altair ~% uname -a ; ./test=20 Linux altair 2.6.22-2-amd64 #1 SMP Thu Aug 30 23:43:59 UTC 2007 x86_64 GNU/Linux Locale: fr_FR.UTF-8 OK a : 32768 OK =E2=9A=91 : 1 OK =C3=B6 : 1 OK =E2=86=91 : 1 OK =C2=A9 : 1 OK =E2=9A=90 : 1 OK =C3=A9 : 1 OK =E2=8F=8F : 1 A value of 0 means unprintable, a positive value means printable (there is a graphical representation). And here is the test I used: #include <stdio.h> #include <locale.h> #include <stdlib.h> #include <wchar.h> int main(void) { printf( "Locale: %s\n", setlocale( LC_ALL, getenv( "LANG" ) ) ); #define MAX 8 const char const tab[MAX][6] =3D { "a", "=E2=9A=91", "=C3=B6", "=E2=86=91", "=C2=A9", "=E2=9A=90", "= =C3=A9", "=E2=8F=8F" }; int i; wchar_t wc; for( i =3D 0; i < MAX; i++ ) { printf("%s ", mbtowc( &wc, tab[i], 6 ) ? "OK" : "KO" ); printf("%s : %d\n", tab[i], iswgraph( wc ) ); } return 0; } I suppose this is a bug in UTF-8 locale, I tested with different $LANG finished by "UTF-8" and the result was the same. Am I right that an Unicode character should always have a graphical representation in an UTF-8 locale ? Thanks --=20 Rafa=C3=ABl Carr=C3=A9 --Sig_/E.JkcNTlRDC02h.7oneId8g Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHj3CcYWCeGMCv8Q8RAhTGAKCvuh60BrgBl8fQHEWgg+LFmj+fAACgzBaH 614hND+LTvD6IrwtSVH3Xtc= =RJlK -----END PGP SIGNATURE----- --Sig_/E.JkcNTlRDC02h.7oneId8g--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080117161329.69fe4135>