Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Apr 2000 10:41:15 -0400
From:      Anatoly Vorobey <mellon@pobox.com>
To:        Alex Belits <abelits@phobos.illtel.denver.co.us>
Cc:        hackers@freebsd.org
Subject:   Re: Unicode on FreeBSD
Message-ID:  <20000404104115.B73509@sasami.jurai.net>
In-Reply-To: <Pine.LNX.4.20.0004032038040.7178-100000@phobos.illtel.denver.co.us>; from abelits@phobos.illtel.denver.co.us on Mon, Apr 03, 2000 at 08:59:51PM -0700
References:  <3.0.6.32.20000403221617.008e2500@mail85.pair.com> <Pine.LNX.4.20.0004032038040.7178-100000@phobos.illtel.denver.co.us>

next in thread | previous in thread | raw e-mail | index | archive | help
You, Alex Belits, were spotted writing this on Mon, Apr 03, 2000 at 08:59:51PM -0700:

> > >-- I am Russian.
> > 
> > So?
> 
>   So I don't want UTF-8 to be forced on me.

Noone is trying to force UTF-8 on you. 

In fact, userland support of UTF-8 can (and should IMHO) be based around
an environment variable a-la LANG which would tell programs whether they
should expect pure 8-bit text or UTF-8 text. This will give you a pretty
easy option to leave things as they are.

> Charset definitions in MIME
> headers exist for a reason.

Yes, and the better mail clients (e.g. mutt) are already able to translate
transparently between different equivalent charsets by using internally
a common superset -- Unicode. Everyone should be able to use whatever 
charset they desire.

>   One of the most basic strengths of Unix is the ease with which text can
> be manipulated, and how "non-text" data can be processed using the same
> tools without any complex "this is text and this is not"
> application-specific procedures. UTF-8 turns "text" into something that
> gives us a dilemma -- to redesign everything to treat "text" as the stream
> of UTF-8 encoded Unicode (and make it impossible to combine text and
> "non-text" without a lot of pain), or to leave tools as they are and deal
> with "invalid" output from perfectly valid operations. 

This is not a dilemma. Just about the only really different aspect of handling
UTF-8 text is the algorithm for calculating the number of characters.
Most of the existing programs can easily be tailored to treat the byte 
stream as either pure 8-bit stream or UTF-8 stream based on YOUR preferences.

-- 
Anatoly Vorobey,
mellon@pobox.com http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000404104115.B73509>