Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 05 Nov 2008 13:37:10 -0800
From:      Tim Kientzle <kientzle@freebsd.org>
To:        Maksim Yevmenkin <maksim.yevmenkin@gmail.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: converting strings from utf8
Message-ID:  <49121206.9090804@freebsd.org>
In-Reply-To: <bb4a86c70811041554k6b55854cw711fab508278e398@mail.gmail.com>
References:  <bb4a86c70811041554k6b55854cw711fab508278e398@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Maksim Yevmenkin wrote:
> 
> can i use wcstombs(3) to convert a string presented in utf8 into
> current locale? basically i'm looking for something like iconv from
> ports but included into base system.

This isn't as easy as it should be, unfortunately.
First, UTF-8 is itself a multibyte encoding, so you have
to first convert to wide characters before you can use
wcstombs().  You could in theory use the following:
   * Set locale to UTF-8
   * use mbstowcs() to convert UTF-8 into wide characters
   * Set locale to your preferred locale
   * use wcstombs() to convert wide characters to your locale

Besides being ugly, the locale names themselves are not
standardized, so it's hard to do this portably.  For a
lot of applications, the error handling in wcstombs() is
also troublesome; it rejects the entire string if any one
character can't be converted.

When I had to do this for libarchive, where the code had
to be very portable (which precluded using iconv), I ended
up doing the following:
  * Wrote my own converter from UTF-8 to wide characters
    (fortunately, UTF-8 is pretty simple to decode; this
     is about 20-30 lines of C)
  * Used wctomb() to convert one character at a time from
     wide characters to the current locale.

I've found that wctomb() is more portable than a lot of
the other functions (I think it's in C89, whereas a lot
of the other standard conversion routines were introduced
in C99) and provides better error-handling capabilities
since it operates on one character at a time (so you
can, for instance, convert characters that aren't
supported in the current locale into '?' or some kind
of \-escape).

Feel free to copy any of my code from libarchive if it helps.

Tim



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49121206.9090804>