From owner-freebsd-hackers Mon Apr 3 21: 4:41 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from phobos.illtel.denver.co.us (dsl-206.169.4.82.wenet.com [206.169.4.82]) by hub.freebsd.org (Postfix) with ESMTP id A0BA637B5B5 for ; Mon, 3 Apr 2000 21:04:38 -0700 (PDT) (envelope-from abelits@phobos.illtel.denver.co.us) Received: from localhost (abelits@localhost) by phobos.illtel.denver.co.us (8.9.3/8.9.3) with ESMTP id UAA07205; Mon, 3 Apr 2000 20:59:51 -0700 Date: Mon, 3 Apr 2000 20:59:51 -0700 (PDT) From: Alex Belits To: "G. Adam Stanislav" Cc: MikeM , freebsd-hackers@FreeBSD.ORG Subject: Re: Unicode on FreeBSD In-Reply-To: <3.0.6.32.20000403221617.008e2500@mail85.pair.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 3 Apr 2000, G. Adam Stanislav wrote: > > Really the question is much more basic -- who benefits from having > >Unicode (or Unicode in the form of UTF-8) support. It isn't me for sure > > Everyone who works with multilingual documents. I feel perfectly fine with "multilingual" documents that contain English and Russian text without Unicode. > Everyone who wants to > follow a single international standard as opposed to a slew of mutually > exclusive local standards. Anyone who thinks globally. "Globally" in this case means following self-proclaimed unificators from Unicode Consortium. > Anyone who has anything to do with the Internet must deal with UTF-8: > "Protocols MUST be able to use the UTF-8 charset, which consists of the ISO > 10646 coded character set combined with the UTF-8 character encoding > scheme, as defined in [10646] Annex R (published in Amendment 2), for all > text." This is not approved by ANYONE but a bunch of "unificators". It never was widely discussed, and affected people never had a chance to give any input. This is the same kind of "standard documents" that ITU issues by dozens. > >-- I am Russian. > > So? So I don't want UTF-8 to be forced on me. Charset definitions in MIME headers exist for a reason. If we want to make something usable we can create a format that can encapsulate existing charsets instead of banning them altogether and replacing with "unified" stuff where cut(1) and dd(1) can produce the output that will be declared "illegal" to be processed as text because it can not be a valid UTF-8 sequence. One of the most basic strengths of Unix is the ease with which text can be manipulated, and how "non-text" data can be processed using the same tools without any complex "this is text and this is not" application-specific procedures. UTF-8 turns "text" into something that gives us a dilemma -- to redesign everything to treat "text" as the stream of UTF-8 encoded Unicode (and make it impossible to combine text and "non-text" without a lot of pain), or to leave tools as they are and deal with "invalid" output from perfectly valid operations. In Windows/Office/... that lives and feeds on complex and unparceable formats this problem couldn't appear or even thought of -- "text" doesn't exist as text at all, and the less stuff will look as something that can be usable outside of strict "object" environment, the better (they now don't even encode it in UTF-8, and use bare 16-bit Unicode). In Unixlike system it's a violation of some very basic rules. -- Alex P.S. I expect that Martin Duerst, the source of 80% of Unicode propaganda on the software-oriented mailing lists will appear within 72 hours here. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message