Date: Wed, 05 Nov 2008 13:37:10 -0800 From: Tim Kientzle <kientzle@freebsd.org> To: Maksim Yevmenkin <maksim.yevmenkin@gmail.com> Cc: freebsd-hackers@freebsd.org Subject: Re: converting strings from utf8 Message-ID: <49121206.9090804@freebsd.org> In-Reply-To: <bb4a86c70811041554k6b55854cw711fab508278e398@mail.gmail.com> References: <bb4a86c70811041554k6b55854cw711fab508278e398@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Maksim Yevmenkin wrote: > > can i use wcstombs(3) to convert a string presented in utf8 into > current locale? basically i'm looking for something like iconv from > ports but included into base system. This isn't as easy as it should be, unfortunately. First, UTF-8 is itself a multibyte encoding, so you have to first convert to wide characters before you can use wcstombs(). You could in theory use the following: * Set locale to UTF-8 * use mbstowcs() to convert UTF-8 into wide characters * Set locale to your preferred locale * use wcstombs() to convert wide characters to your locale Besides being ugly, the locale names themselves are not standardized, so it's hard to do this portably. For a lot of applications, the error handling in wcstombs() is also troublesome; it rejects the entire string if any one character can't be converted. When I had to do this for libarchive, where the code had to be very portable (which precluded using iconv), I ended up doing the following: * Wrote my own converter from UTF-8 to wide characters (fortunately, UTF-8 is pretty simple to decode; this is about 20-30 lines of C) * Used wctomb() to convert one character at a time from wide characters to the current locale. I've found that wctomb() is more portable than a lot of the other functions (I think it's in C89, whereas a lot of the other standard conversion routines were introduced in C99) and provides better error-handling capabilities since it operates on one character at a time (so you can, for instance, convert characters that aren't supported in the current locale into '?' or some kind of \-escape). Feel free to copy any of my code from libarchive if it helps. Tim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49121206.9090804>