Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 3 Apr 2000 20:59:51 -0700 (PDT)
From:      Alex Belits <abelits@phobos.illtel.denver.co.us>
To:        "G. Adam Stanislav" <adam@whizkidtech.net>
Cc:        MikeM <mike_bsdlists@yahoo.com>, freebsd-hackers@FreeBSD.ORG
Subject:   Re: Unicode on FreeBSD
Message-ID:  <Pine.LNX.4.20.0004032038040.7178-100000@phobos.illtel.denver.co.us>
In-Reply-To: <3.0.6.32.20000403221617.008e2500@mail85.pair.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 3 Apr 2000, G. Adam Stanislav wrote:

> >  Really the question is much more basic -- who benefits from having
> >Unicode (or Unicode in the form of UTF-8) support. It isn't me for sure
> 
> Everyone who works with multilingual documents.

  I feel perfectly fine with "multilingual" documents that contain English
and Russian text without Unicode.

> Everyone who wants to
> follow a single international standard as opposed to a slew of mutually
> exclusive local standards. Anyone who thinks globally.

  "Globally" in this case means following self-proclaimed unificators from
Unicode Consortium.

> Anyone who has anything to do with the Internet must deal with UTF-8:
> "Protocols MUST be able to use the UTF-8 charset, which consists of the ISO
> 10646 coded character set combined with the UTF-8 character encoding
> scheme, as defined in [10646] Annex R (published in Amendment 2), for all
> text." <RFC 2277>

  This is not approved by ANYONE but a bunch of "unificators". It never
was widely discussed, and affected people never had a chance to give any
input. This is the same kind of "standard documents" that ITU issues by
dozens.

> >-- I am Russian.
> 
> So?

  So I don't want UTF-8 to be forced on me. Charset definitions in MIME
headers exist for a reason. If we want to make something usable we can
create a format that can encapsulate existing charsets instead of banning
them altogether and replacing with "unified" stuff where cut(1) and
dd(1) can produce the output that will be declared "illegal" to be
processed as text because it can not be a valid UTF-8 sequence.

  One of the most basic strengths of Unix is the ease with which text can
be manipulated, and how "non-text" data can be processed using the same
tools without any complex "this is text and this is not"
application-specific procedures. UTF-8 turns "text" into something that
gives us a dilemma -- to redesign everything to treat "text" as the stream
of UTF-8 encoded Unicode (and make it impossible to combine text and
"non-text" without a lot of pain), or to leave tools as they are and deal
with "invalid" output from perfectly valid operations. In
Windows/Office/... that lives and feeds on complex and unparceable formats
this problem couldn't appear or even thought of -- "text" doesn't exist as
text at all, and the less stuff will look as something that can be usable
outside of strict "object" environment, the better (they now don't even
encode it in UTF-8, and use bare 16-bit Unicode). In Unixlike system it's
a violation of some very basic rules.

-- 
Alex

P.S. I expect that Martin Duerst, the source of 80% of Unicode propaganda
on the software-oriented mailing lists will appear within 72 hours here.



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.20.0004032038040.7178-100000>