Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Apr 2000 08:35:11 -0700 (PDT)
From:      "Eugene M. Kim" <ab@astralblue.com>
To:        Alex Belits <abelits@phobos.illtel.denver.co.us>
Cc:        "G. Adam Stanislav" <adam@whizkidtech.net>, MikeM <mike_bsdlists@yahoo.com>, freebsd-hackers@FreeBSD.ORG
Subject:   Re: Unicode on FreeBSD
Message-ID:  <Pine.BSF.4.20.0004040808360.5035-100000@home.astralblue.com>
In-Reply-To: <Pine.LNX.4.20.0004032038040.7178-100000@phobos.illtel.denver.co.us>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 3 Apr 2000, Alex Belits wrote:

| On Mon, 3 Apr 2000, G. Adam Stanislav wrote:
| 
| > >  Really the question is much more basic -- who benefits from having
| > >Unicode (or Unicode in the form of UTF-8) support. It isn't me for sure
| > 
| > Everyone who works with multilingual documents.
| 
|   I feel perfectly fine with "multilingual" documents that contain English
| and Russian text without Unicode.

Please, try thinking wider.  Ever thought a mixture of Russian, Hebrew,
Korean and English?  AFAIK no CCS other than Unicode currently can
handle this.

| 
| > Everyone who wants to
| > follow a single international standard as opposed to a slew of mutually
| > exclusive local standards. Anyone who thinks globally.
| 
|   "Globally" in this case means following self-proclaimed unificators from
| Unicode Consortium.
| 
| > Anyone who has anything to do with the Internet must deal with UTF-8:
| > "Protocols MUST be able to use the UTF-8 charset, which consists of the ISO
| > 10646 coded character set combined with the UTF-8 character encoding
| > scheme, as defined in [10646] Annex R (published in Amendment 2), for all
| > text." <RFC 2277>
| 
|   This is not approved by ANYONE but a bunch of "unificators". It never
| was widely discussed, and affected people never had a chance to give any
| input. This is the same kind of "standard documents" that ITU issues by
| dozens.

True, personally I don't like the way Unicode Consortium operates
either; I'd prefer a more open system such as IETF.  However, it seems
an error to brand Unicode as a bad-motivated idea just because the
operating body is less ideal.  And given that RFC 2277 is just a BCP
(Best Current Practice) but not a `standard' document, it doesn't have
to be approved by anyone either.  If you don't feel right about it, why
don't you send a short e-mail message to its author?

| 
| > >-- I am Russian.
| > 
| > So?
| 
|   So I don't want UTF-8 to be forced on me. Charset definitions in MIME
| headers exist for a reason. If we want to make something usable we can
| create a format that can encapsulate existing charsets instead of banning
| them altogether and replacing with "unified" stuff where cut(1) and
| dd(1) can produce the output that will be declared "illegal" to be
| processed as text because it can not be a valid UTF-8 sequence.

Nobody is banning anything.  Please be reminded that RFC 2277 only
mandates the support for UTF-8.  One can still go ahead and use
US-ASCII, EUC-KR, or whatever you want so far as the protocol supports
character set designation such as MIME.  And again, RFC 2277 is a BCP.  
Unlike standards, BCPs has no enforcing power at all.

| 
|   One of the most basic strengths of Unix is the ease with which text can
| be manipulated, and how "non-text" data can be processed using the same
| tools without any complex "this is text and this is not"
| application-specific procedures. UTF-8 turns "text" into something that
| gives us a dilemma -- to redesign everything to treat "text" as the stream
| of UTF-8 encoded Unicode (and make it impossible to combine text and
| "non-text" without a lot of pain), or to leave tools as they are and deal
| with "invalid" output from perfectly valid operations. In
| Windows/Office/... that lives and feeds on complex and unparceable formats
| this problem couldn't appear or even thought of -- "text" doesn't exist as
| text at all, and the less stuff will look as something that can be usable
| outside of strict "object" environment, the better (they now don't even
| encode it in UTF-8, and use bare 16-bit Unicode). In Unixlike system it's
| a violation of some very basic rules.

Yes, it is true that the entire UN*X world is so deeply rooted in single
byte-oriented world and it's hard to come up with a reasonable migration
path to the multibyte world.  But that doesn't justify the byte-oriented
system.  It has all too many limitations (which you might not realize
until you had to mix all different languages in one document; I did
:-p), and there has to be an alternative.  I'm not saying that the
entire UN*X world should migrate to the Unicode world in months.  We all
know that is just impossible.

Eugene

-- 
Eugene M. Kim <ab@astralblue.com>

"Is your music unpopular?  Make it popular; make music
which people like, or make people who like your music."



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.20.0004040808360.5035-100000>