From owner-freebsd-hackers  Tue Apr  4 18:16:11 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from whizkidtech.net (rh24.bfm.org [216.127.220.217])
	by hub.freebsd.org (Postfix) with ESMTP id A7B2837B8A4
	for <freebsd-hackers@FreeBSD.ORG>; Tue,  4 Apr 2000 18:16:03 -0700 (PDT)
	(envelope-from adam@whizkidtech.net)
Received: (from adam@localhost)
	by whizkidtech.net (8.9.2/8.9.2) id UAA00293;
	Tue, 4 Apr 2000 20:14:42 -0500 (CDT)
	(envelope-from adam)
Date: Tue, 4 Apr 2000 20:14:12 -0500
From: "G. Adam Stanislav" <adam@whizkidtech.net>
To: Alex Belits <abelits@phobos.illtel.denver.co.us>
Cc: freebsd-hackers@FreeBSD.ORG
Subject: Re: Unicode on FreeBSD
Message-ID: <20000404201412.C261@whizkidtech.net>
References: <v04210106b51010371497@[128.113.24.47]> <Pine.LNX.4.10.10004041630110.1142-100000@mercury>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 8bit
X-Mailer: Mutt 1.0.1i
In-Reply-To: <Pine.LNX.4.10.10004041630110.1142-100000@mercury>; from abelits@phobos.illtel.denver.co.us on Tue, Apr 04, 2000 at 05:05:05PM -0700
Organization: Whiz Kid Technomagic
X-URL: http://www.whizkidtech.net/
X-Castle: http://www.redprince.net/
X-Operating-System: FreeBSD whizkidtech.net 3.1-RELEASE FreeBSD 3.1-RELEASE 
X-SG-Player-ID: 0278852114
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Tue, Apr 04, 2000 at 05:05:05PM -0700, Alex Belits wrote:
>  The existing "market" of multilingual application is so small, and it's
>based on so simplistic requirements (to be able to display and print
>characters, and make multilingual "web pages"), that even solution so much
>flawed as standardization on Unicode can survive. Unicode is positioned as
>the _replacement_ for languages/charsets handling infrastructure -- "we
>know all the characters, so we can write all the words, right?".

Not so. Unicode is a character map. One of many. It just happens to be
the most inclusive one in existence.

I also strongly disagree with your view of it being simplistic. Unicode
is not, and never was, meant to be a high level linguistic system. Rather,
it provides primitives for such a system. It is a map, nothing else. It
is system-independent. It does not even specify how the map is to be
encoded (e.g., UTF-8, or 16 bits, etc).

The Unicode Consortium does provide all kinds of text files that help
programmers use the map better: They provide such information as which
character is upper case, lower case, digit, control, etc; how to convert
upper case to lower case, and things like that.

It does not, for example, provide sorting order. It cannot. Unicode is
not about linguistics, it is about mapping characters regardless of their
use in specific languages. And different languages sort characters
differently. For example, in Slovak, "ch" is considered a character
which belongs after the "h". In other languages it is sorted differently.
And in most languages, it is just two unrelated characters.


Unicode is not simplistic. It does what its stated goal is, and it does
it well. How we use it, is up to us.

Cheers,
Adam

P.S. Hmmm... Interesting. I noticed my random quote contains a C-caron.
I wonder how it is going to be handled. :)

-- 
Can you imagine the silence if everyone said only what he knows!
		-- Karel Èapek


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message