From owner-freebsd-hackers  Mon Apr  3 21:38:26 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from mail.bfm.org (mail.bfm.org [216.127.218.26])
	by hub.freebsd.org (Postfix) with ESMTP id 7123437B7AA
	for <freebsd-hackers@FreeBSD.ORG>; Mon,  3 Apr 2000 21:38:15 -0700 (PDT)
	(envelope-from adam@whizkidtech.net)
Received: from WhizKid (r31.bfm.org [216.127.220.127]) by mail.bfm.org
          (Post.Office MTA v3.5.3 release 223 ID# 0-52399U2500L250S0V35)
          with SMTP id org; Mon, 3 Apr 2000 23:38:53 -0500
Message-Id: <3.0.6.32.20000403233641.008e6590@mail85.pair.com>
X-Sender: whizkid@mail85.pair.com
X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32)
Date: Mon, 03 Apr 2000 23:36:41 -0500
To: Alex Belits <abelits@phobos.illtel.denver.co.us>
From: "G. Adam Stanislav" <adam@whizkidtech.net>
Subject: Re: Unicode on FreeBSD
Cc: MikeM <mike_bsdlists@yahoo.com>, freebsd-hackers@FreeBSD.ORG
In-Reply-To: <Pine.LNX.4.20.0004032038040.7178-100000@phobos.illtel.denv
 er.co.us>
References: <3.0.6.32.20000403221617.008e2500@mail85.pair.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

At 20:59 03-04-2000 -0700, Alex Belits wrote:
>  I feel perfectly fine with "multilingual" documents that contain English
>and Russian text without Unicode.

Those are bilingual, not multilingual. I once had to create a document in
English, Slovak, and Sanskrit (using Devanagari alphabet). There is only
one standard that makes it possible: Unicode. Too bad UTF-8 did not exist
at the time, and I had to use graphics.

>> Everyone who wants to
>> follow a single international standard as opposed to a slew of mutually
>> exclusive local standards. Anyone who thinks globally.

>  "Globally" in this case means following self-proclaimed unificators from
>Unicode Consortium.

I don't know what you mean by "unificators." Why self proclaimed? Those
were people with a need for which they found a solution. Unicode Consortium
has no power to force Unicode on anyone. It just happens that it was widely
accepted. You're free to create your own system, or ignore it all together.
But just because you see no need for Unicode does not mean you should be
upset when people are willing to work on Unicode support in FreeBSD.

>> Anyone who has anything to do with the Internet must deal with UTF-8:
>> "Protocols MUST be able to use the UTF-8 charset, which consists of the ISO
>> 10646 coded character set combined with the UTF-8 character encoding
>> scheme, as defined in [10646] Annex R (published in Amendment 2), for all
>> text." <RFC 2277>

>  This is not approved by ANYONE but a bunch of "unificators". It never
>was widely discussed, and affected people never had a chance to give any
>input. This is the same kind of "standard documents" that ITU issues by
>dozens.

Affected in what way? Many ways of encoding Unicode were proposed,
developed, and used. Most of them are history by now. UTF-8 is the best way
to encode Unicode to this day. Don't like it? Design a better one.

>> >-- I am Russian.
>> 
>> So?
>
>  So I don't want UTF-8 to be forced on me.

Who's forcing it on you?

> Charset definitions in MIME
>headers exist for a reason. If we want to make something usable we can
>create a format that can encapsulate existing charsets instead of banning
>them altogether and replacing with "unified" stuff where cut(1) and
>dd(1) can produce the output that will be declared "illegal" to be
>processed as text because it can not be a valid UTF-8 sequence.

You are worried about nothing. No one in this discussion has said anything
about making anything but Unicode and UTF-8 "illegal." Supporting Unicode
does not mean stopping support for everything else.

>  One of the most basic strengths of Unix is the ease with which text can
>be manipulated, and how "non-text" data can be processed using the same
>tools without any complex "this is text and this is not"
>application-specific procedures.

Nothing complex about it. UTF-8 uses a very simple algorithm which makes it
very simple to distinguish text from non-text.

>UTF-8 turns "text" into something that
>gives us a dilemma -- to redesign everything to treat "text" as the stream
>of UTF-8 encoded Unicode (and make it impossible to combine text and
>"non-text" without a lot of pain), or to leave tools as they are and deal
>with "invalid" output from perfectly valid operations.

You don't have to treat everything as the stream of UTF-8 encoded Unicode.
Again, supporting Unicode does not mean EVERYTHING must be Unicode. That
would not make sense, at least not now. It may in the future. Unicode is
here to stay.


>In
>Windows/Office/... that lives and feeds on complex and unparceable formats
>this problem couldn't appear or even thought of -- "text" doesn't exist as
>text at all, and the less stuff will look as something that can be usable
>outside of strict "object" environment, the better (they now don't even
>encode it in UTF-8, and use bare 16-bit Unicode). In Unixlike system it's
>a violation of some very basic rules.

What does Windows have to do with Unicode? Windows support for Unicode
sucks royally. Except for NT, Windows' Unicode support is virtually
non-existent.

When did it stop Unix programmers from doing something Microsoft cannot
handle? Unix already handles Unicode better than anything under Windows.
For example, Lynx handles Unicode quite well, and it does it on text-only
displays that have no way of supporting a multitude of fonts.

Cheers,
Adam
-----------------------------------------------------------
"I think, therefore I am."
                    - Seventeenth Century Philosophy

"I publish what I think, therefore I have."
                    - Twenty-First Century Action

Details at http://www.OnlinePublisher.net/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message