From owner-freebsd-hackers Tue Jan 20 22:07:23 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id WAA16506 for hackers-outgoing; Tue, 20 Jan 1998 22:07:23 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from phobos.illtel.denver.co.us (abelits@phobos.illtel.denver.co.us [207.33.75.1]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id WAA16493 for ; Tue, 20 Jan 1998 22:07:13 -0800 (PST) (envelope-from abelits@phobos.illtel.denver.co.us) Received: from localhost (abelits@localhost) by phobos.illtel.denver.co.us (8.8.8/8.6.9) with SMTP id WAA10137; Tue, 20 Jan 1998 22:10:48 -0800 Date: Tue, 20 Jan 1998 22:10:47 -0800 (PST) From: Alex Belits To: "Louis A. Mamakos" cc: Terry Lambert , daniel_sobral@voga.com.br, hackers@FreeBSD.ORG Subject: Re: Wide characters on tcp connections In-Reply-To: <199801210420.XAA23356@whizzo.TransSys.COM> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk On Tue, 20 Jan 1998, Louis A. Mamakos wrote: > If I had to choose, I'd use UTF-8 encodings in big-endian byte order. This > is, I believe, what the IETF has chosen when dealing with multi-byte > characters which are embedded within other protocols. IETF "has chosen" UTF-8 (and Unicode) after every nation, with or without multibyte alphabet, rejected Unicode as a standard, but some well-known company decided to make "internationalization standard" based on Unicode (and still failed to implement it properly even in their also well-known OS). UTF-8 got a lot of support in Western Europe and US, however it should be mentioned that when converted to Unicode and then UTF-8, ASCII text is the same as before encoding, iso8859-1 (Latin1) has trivial back conversion, but other languages look umm... too unstructured for their native speakers to say the least. There is a number of issues of linguistic, technical and political nature that were ignored when Unicode was designed, in other words everything made in local standards was thrown away, and just all known at the moment (and considered to be worthy enough to be included) characters were listed in some order resembling their alphabets. UTF-8 encoding is blatantly US/European-centric -- that can be justified (it's supposed to be used for everything, and most of "everything" is ASCII text), but it's ridiculous for other languages, and I haven't even started talking about regexps and text processing over variable-length characters or constant encoding/decoding into fixed-length Unicode that UTF-8 makes necessary for everything but "word processing" that some people confuse with the use of computers. Currently no one uses Unicode for anything serious in non-European languages, and since MIME has no problems with charsets labeling, people continue to use local charsets that reflect local language's structure way better than Unicode. However it looks like this "yet another Esperanto" is going to be the next way of making more money selling "new" software with "standards compliance" sticker without actually providing any languages support and without compatibility with anything that currently is in use. -- Alex