From owner-freebsd-hackers Tue Apr 4 18:16:11 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from whizkidtech.net (rh24.bfm.org [216.127.220.217]) by hub.freebsd.org (Postfix) with ESMTP id A7B2837B8A4 for ; Tue, 4 Apr 2000 18:16:03 -0700 (PDT) (envelope-from adam@whizkidtech.net) Received: (from adam@localhost) by whizkidtech.net (8.9.2/8.9.2) id UAA00293; Tue, 4 Apr 2000 20:14:42 -0500 (CDT) (envelope-from adam) Date: Tue, 4 Apr 2000 20:14:12 -0500 From: "G. Adam Stanislav" To: Alex Belits Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Unicode on FreeBSD Message-ID: <20000404201412.C261@whizkidtech.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 1.0.1i In-Reply-To: ; from abelits@phobos.illtel.denver.co.us on Tue, Apr 04, 2000 at 05:05:05PM -0700 Organization: Whiz Kid Technomagic X-URL: http://www.whizkidtech.net/ X-Castle: http://www.redprince.net/ X-Operating-System: FreeBSD whizkidtech.net 3.1-RELEASE FreeBSD 3.1-RELEASE X-SG-Player-ID: 0278852114 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, Apr 04, 2000 at 05:05:05PM -0700, Alex Belits wrote: > The existing "market" of multilingual application is so small, and it's >based on so simplistic requirements (to be able to display and print >characters, and make multilingual "web pages"), that even solution so much >flawed as standardization on Unicode can survive. Unicode is positioned as >the _replacement_ for languages/charsets handling infrastructure -- "we >know all the characters, so we can write all the words, right?". Not so. Unicode is a character map. One of many. It just happens to be the most inclusive one in existence. I also strongly disagree with your view of it being simplistic. Unicode is not, and never was, meant to be a high level linguistic system. Rather, it provides primitives for such a system. It is a map, nothing else. It is system-independent. It does not even specify how the map is to be encoded (e.g., UTF-8, or 16 bits, etc). The Unicode Consortium does provide all kinds of text files that help programmers use the map better: They provide such information as which character is upper case, lower case, digit, control, etc; how to convert upper case to lower case, and things like that. It does not, for example, provide sorting order. It cannot. Unicode is not about linguistics, it is about mapping characters regardless of their use in specific languages. And different languages sort characters differently. For example, in Slovak, "ch" is considered a character which belongs after the "h". In other languages it is sorted differently. And in most languages, it is just two unrelated characters. Unicode is not simplistic. It does what its stated goal is, and it does it well. How we use it, is up to us. Cheers, Adam P.S. Hmmm... Interesting. I noticed my random quote contains a C-caron. I wonder how it is going to be handled. :) -- Can you imagine the silence if everyone said only what he knows! -- Karel Čapek To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message