Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Jan 2008 16:13:29 +0100
From:      =?UTF-8?B?UmFmYcOrbCBDYXJyw6k=?= <funman@videolan.org>
To:        questions@freebsd.org
Subject:   Some UTF-8 characters are not representable on FreeBSD7
Message-ID:  <20080117161329.69fe4135@zod.zod>

next in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
Hello,

I noticed I couldn't use some characters with libncursesw: namely ⚑ ⚐
and ⏏.

I run into some tests and found that some characters were reported as
unprintable, while on Linux all was fine.

I found it extremely strange since those characters would show up in my
terminal (gnome-terminal) when I pasted them.

Here are the results of the test I ran on Linux and FreeBSD:

[fun@zod ~]% uname -a ;./test
FreeBSD zod 7.0-BETA3 FreeBSD 7.0-BETA3 #0: Sun Dec  2 02:30:18 CET
2007     root@zod:/media/externe/usr/src/sys/ZOD  i386 Locale:
fr_FR.UTF-8 OK a : 1 
OK ⚑ : 0
OK ö : 1
OK ↑ : 1
OK © : 1
OK ⚐ : 0
OK é : 1
OK ⏏ : 0

[fun@zod ~]% uname -a ; LANG=fr_FR.ISO8859-15 ./test
FreeBSD zod 7.0-BETA3 FreeBSD 7.0-BETA3 #0: Sun Dec  2 02:30:18 CET
2007     root@zod:/media/externe/usr/src/sys/ZOD  i386 Locale:
fr_FR.ISO8859-15 OK a : 1
OK ⚑ : 1
OK ö : 1
OK ↑ : 1
OK © : 1
OK ⚐ : 1
OK é : 1
OK ⏏ : 1


16:03 funman@altair  ~% uname -a ; ./test 
Linux altair 2.6.22-2-amd64 #1 SMP Thu Aug 30 23:43:59 UTC 2007 x86_64
GNU/Linux Locale: fr_FR.UTF-8
OK a : 32768
OK ⚑ : 1
OK ö : 1
OK ↑ : 1
OK © : 1
OK ⚐ : 1
OK é : 1
OK ⏏ : 1


A value of 0 means unprintable, a positive value means printable (there
is a graphical representation).

And here is the test I used:

#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wchar.h>

int main(void)
{
    printf( "Locale: %s\n", setlocale( LC_ALL, getenv( "LANG" ) ) );

#define MAX 8
    const char const tab[MAX][6] = {
        "a", "⚑", "ö", "↑", "©", "⚐", "é", "⏏"
    };

    int i;
    wchar_t wc;
    for( i = 0; i < MAX; i++ )
    {
        printf("%s ", mbtowc( &wc, tab[i], 6 ) ? "OK" : "KO" );
        printf("%s : %d\n", tab[i], iswgraph( wc ) );
    }

    return 0;
}


I suppose this is a bug in UTF-8 locale, I tested with different
$LANG finished by "UTF-8" and the result was the same.

Am I right that an Unicode character should always have a graphical
representation in an UTF-8 locale ?

Thanks

-- 
Rafaël Carré

[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHj3CcYWCeGMCv8Q8RAhTGAKCvuh60BrgBl8fQHEWgg+LFmj+fAACgzBaH
614hND+LTvD6IrwtSVH3Xtc=
=RJlK
-----END PGP SIGNATURE-----

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080117161329.69fe4135>