Date: Tue, 4 Apr 2000 08:35:11 -0700 (PDT) From: "Eugene M. Kim" <ab@astralblue.com> To: Alex Belits <abelits@phobos.illtel.denver.co.us> Cc: "G. Adam Stanislav" <adam@whizkidtech.net>, MikeM <mike_bsdlists@yahoo.com>, freebsd-hackers@FreeBSD.ORG Subject: Re: Unicode on FreeBSD Message-ID: <Pine.BSF.4.20.0004040808360.5035-100000@home.astralblue.com> In-Reply-To: <Pine.LNX.4.20.0004032038040.7178-100000@phobos.illtel.denver.co.us>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 3 Apr 2000, Alex Belits wrote: | On Mon, 3 Apr 2000, G. Adam Stanislav wrote: | | > > Really the question is much more basic -- who benefits from having | > >Unicode (or Unicode in the form of UTF-8) support. It isn't me for sure | > | > Everyone who works with multilingual documents. | | I feel perfectly fine with "multilingual" documents that contain English | and Russian text without Unicode. Please, try thinking wider. Ever thought a mixture of Russian, Hebrew, Korean and English? AFAIK no CCS other than Unicode currently can handle this. | | > Everyone who wants to | > follow a single international standard as opposed to a slew of mutually | > exclusive local standards. Anyone who thinks globally. | | "Globally" in this case means following self-proclaimed unificators from | Unicode Consortium. | | > Anyone who has anything to do with the Internet must deal with UTF-8: | > "Protocols MUST be able to use the UTF-8 charset, which consists of the ISO | > 10646 coded character set combined with the UTF-8 character encoding | > scheme, as defined in [10646] Annex R (published in Amendment 2), for all | > text." <RFC 2277> | | This is not approved by ANYONE but a bunch of "unificators". It never | was widely discussed, and affected people never had a chance to give any | input. This is the same kind of "standard documents" that ITU issues by | dozens. True, personally I don't like the way Unicode Consortium operates either; I'd prefer a more open system such as IETF. However, it seems an error to brand Unicode as a bad-motivated idea just because the operating body is less ideal. And given that RFC 2277 is just a BCP (Best Current Practice) but not a `standard' document, it doesn't have to be approved by anyone either. If you don't feel right about it, why don't you send a short e-mail message to its author? | | > >-- I am Russian. | > | > So? | | So I don't want UTF-8 to be forced on me. Charset definitions in MIME | headers exist for a reason. If we want to make something usable we can | create a format that can encapsulate existing charsets instead of banning | them altogether and replacing with "unified" stuff where cut(1) and | dd(1) can produce the output that will be declared "illegal" to be | processed as text because it can not be a valid UTF-8 sequence. Nobody is banning anything. Please be reminded that RFC 2277 only mandates the support for UTF-8. One can still go ahead and use US-ASCII, EUC-KR, or whatever you want so far as the protocol supports character set designation such as MIME. And again, RFC 2277 is a BCP. Unlike standards, BCPs has no enforcing power at all. | | One of the most basic strengths of Unix is the ease with which text can | be manipulated, and how "non-text" data can be processed using the same | tools without any complex "this is text and this is not" | application-specific procedures. UTF-8 turns "text" into something that | gives us a dilemma -- to redesign everything to treat "text" as the stream | of UTF-8 encoded Unicode (and make it impossible to combine text and | "non-text" without a lot of pain), or to leave tools as they are and deal | with "invalid" output from perfectly valid operations. In | Windows/Office/... that lives and feeds on complex and unparceable formats | this problem couldn't appear or even thought of -- "text" doesn't exist as | text at all, and the less stuff will look as something that can be usable | outside of strict "object" environment, the better (they now don't even | encode it in UTF-8, and use bare 16-bit Unicode). In Unixlike system it's | a violation of some very basic rules. Yes, it is true that the entire UN*X world is so deeply rooted in single byte-oriented world and it's hard to come up with a reasonable migration path to the multibyte world. But that doesn't justify the byte-oriented system. It has all too many limitations (which you might not realize until you had to mix all different languages in one document; I did :-p), and there has to be an alternative. I'm not saying that the entire UN*X world should migrate to the Unicode world in months. We all know that is just impossible. Eugene -- Eugene M. Kim <ab@astralblue.com> "Is your music unpopular? Make it popular; make music which people like, or make people who like your music." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.20.0004040808360.5035-100000>