Date: Thu, 11 Jun 1998 13:56:17 +0600 From: Konstantin Chuguev <joy@urc.ac.ru> To: Chen Hsiung Chan <frankch@waru.life.nthu.edu.tw> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: i18n - what I can do for it? Message-ID: <357F8DA1.EDDBB6D0@urc.ac.ru> References: <19980611135643.05642@waru.life.nthu.edu.tw>
next in thread | previous in thread | raw e-mail | index | archive | help
Chen Hsiung Chan wrote: > > I am not sure about the way it is done. In fact big5 is not a > good encoding (not conform to ISO-2022), but I can not get rid > of it (it is the de facto standard in Taiwan now). > There are no bad charset encodings, there are just incompatible ones :-) And there are no charset encodings compatible with all other practically used ones. Without speaking about 8-bit charsets, two major candidates for being elected as the universal are Unicode and ISO 2022. (Is big5 compatible/convertable to Unicode?) As we can see, none of them satisfies all users. While both are developing, there is a chance (but not a guarantee) that one of them will sometime satisfy all the people. We cannot say now, which of them. So there is a need in multiple charsets support in the OS and in powerful charset conversion mechanism. IMO it's worth choosing among something already developed in this area instead of making something completely from scratch. I am interesting, how many i18n APIs (or just source code pieces) are available for public use? Including charset conversion, gettext etc. BTW, TCL-8.1's conception looks very attractive. It uses Unicode (UTF-8) for its internal string representation, and has powerful and flexible charset conversion mechanism. There is one "system" character set (being got from locale), and all TCL's channels (virtual representation of files, sockets etc.) can have a charset associated other than "system". Currently supported charsets (not counting Unicode and UTF-8) are: ascii, big5, cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860, cp861, cp862, cp863, cp864, cp865, cp866, cp869, cp874, cp932, cp936, cp949, cp950, dingbats, euc-jp, gb12345, gb1988, gb2312, iso2022-jp, iso2022-kr, iso2022, iso8859-1, iso8859-2, iso8859-3, iso8859-4, iso8859-5, iso8859-6, iso8859-7, iso8859-8, iso8859-9, jis0201, jis0208, jis0212, macCentEuro, macCroatian, macCyrillic, macDingbats, macGreek, macIceland, macJapan, macRoman, macRomania, macThai, macTurkish, macUkraine, shiftjis, symbol. All of them defined in external files. And as for TK-8.1, it has built-in mechanism of accepting keycodes in the system locale, and demultiplexing fonts' charsets from its internal Unicode to fonts available in the system. Tomorrow I added koi8-r to this list, and after patching lightly tcl-8.1 sources, made Zircon IRC client able to speak any of these charsets :-) I like it very much. I understand, that's i18n implementation in such high-level language as TCL is much simpler, than in C and the kernel. But the latter is not impossible, there's just the need in clear specification here. There is also TERENA's MAITS' internalization API, but there a few information about it in the Internet, and I don't know about its license terms and copyright. Anybody knows other examples? -- Konstantin V. Chuguev. System administrator of Southern http://www.urc.ac.ru/~joy/ Ural Regional Center of FREEnet, mailto:joy@urc.ac.ru Chelyabinsk, Russia. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?357F8DA1.EDDBB6D0>